InfluxDB Logging System#
Overview#
The Dart Cloud backend integrates with InfluxDB 3 Core for centralized function execution logging, metrics collection, and monitoring. This system provides real-time insights into function performance, errors, and resource usage.
Architecture#
┌─────────────────────────────────────────────────────────┐
│ Cloud Function Execution │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ CloudLogger (embedded in function) │ │
│ │ - Captures stdout/stderr │ │
│ │ - Records execution metrics │ │
│ │ - Tracks errors and exceptions │ │
│ └──────────────────────────────────────────────────┘ │
│ │ │ │
│ ├─ UdpTransport ──────────────┤ │
│ │ (UDP:8094) │ │
│ │ │ │
│ └─ FileTransport ─────────────┤ │
│ (./logs.json) │ │
└─────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────┐ ┌──────────────┐
│ Telegraf │ │ File Backup │
│ (UDP:8094) │ │ (logs.json) │
└─────────────┘ └──────────────┘
│
▼
┌─────────────────────────┐
│ InfluxDB 3 Core │
│ (dart_cloud_logs) │
└─────────────────────────┘
│
├─ Grafana Dashboards
├─ REST API Queries
└─ CLI Tools
Components#
CloudLogger#
Embedded logging library that runs inside deployed functions and captures:
- Function Output: stdout and stderr streams
- Execution Metrics: Duration, memory usage, CPU time
- Error Tracking: Exceptions, stack traces, error codes
- Metadata: Function ID, execution ID, timestamps
Telegraf#
Log aggregation service that:
- Listens on UDP port 8094
- Receives logs from CloudLogger
- Forwards to InfluxDB
- Provides backup file storage
InfluxDB 3 Core#
Time-series database that:
- Stores all function execution logs
- Provides SQL query interface
- Supports real-time data ingestion
- Enables long-term metrics analysis
Grafana (Optional)#
Visualization platform for:
- Real-time dashboards
- Historical trend analysis
- Alert configuration
- Custom metric visualization
Environment Configuration#
Backend Service#
# InfluxDB Connection
INFLUXDB_URL=http://dart_cloud_influxdb:8086
INFLUXDB_TOKEN=<your-secure-token>
INFLUXDB_ORG=dart_cloud
INFLUXDB_BUCKET=dart_cloud_logs
# Telegraf Configuration
TELEGRAF_HOST=telegraf
TELEGRAF_PORT=8094
# Logging
LOG_FILE_PATH=./logs.json
LOG_KEEP_FILE_BACKUP=true
LOG_ENABLE_CONSOLE=true
LOG_MEASUREMENT=function_logs
Docker Compose Services#
The docker-compose.yml includes:
services:
influxdb:
image: influxdb:3-core
ports:
- "8086:8086"
environment:
INFLUXDB_DB: dart_cloud_logs
INFLUXDB_ADMIN_USER: admin
INFLUXDB_ADMIN_PASSWORD: ${INFLUXDB_PASSWORD}
telegraf:
image: telegraf:1.37.1-alpine
ports:
- "8094:8094/udp"
volumes:
- ./deploy/config/telegraf.conf:/etc/telegraf/telegraf.conf:ro
depends_on:
- influxdb
grafana:
image: grafana/grafana:main
ports:
- "3000:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
depends_on:
- influxdb
Log Data Structure#
Measurement Schema#
Logs are stored as time-series data with the following structure:
{
"measurement": "function_logs",
"tags": {
"function_id": "550e8400-e29b-41d4-a716-446655440000",
"execution_id": "660e8400-e29b-41d4-a716-446655440000",
"status": "success",
"version": "1"
},
"fields": {
"duration_ms": 1234,
"memory_used_mb": 45,
"exit_code": 0,
"cpu_time_ms": 800,
"message": "Function executed successfully"
},
"timestamp": "2024-01-15T10:30:00.000Z"
}
Tag Fields (Indexed)#
- function_id: UUID of the deployed function
- execution_id: Unique ID for this execution
-
status:
success,error,timeout,failed - version: Function version number
Field Values (Queryable)#
- duration_ms: Total execution time in milliseconds
- memory_used_mb: Peak memory usage
- exit_code: Container exit code (0 = success, -1 = timeout)
- cpu_time_ms: CPU time consumed
- message: Log message or error description
Querying Logs#
SQL Query Examples#
Get recent function logs:
SELECT * FROM function_logs
WHERE function_id = '550e8400-e29b-41d4-a716-446655440000'
ORDER BY time DESC
LIMIT 100
Get error logs:
SELECT * FROM function_logs
WHERE status = 'error'
AND time > now() - interval '24 hours'
ORDER BY time DESC
Get performance metrics:
SELECT
function_id,
AVG(duration_ms) as avg_duration,
MAX(duration_ms) as max_duration,
AVG(memory_used_mb) as avg_memory
FROM function_logs
WHERE time > now() - interval '7 days'
GROUP BY function_id
ORDER BY avg_duration DESC
Get execution statistics:
SELECT
COUNT(*) as total_executions,
SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successful,
SUM(CASE WHEN status = 'error' THEN 1 ELSE 0 END) as failed,
SUM(CASE WHEN status = 'timeout' THEN 1 ELSE 0 END) as timeouts
FROM function_logs
WHERE time > now() - interval '1 hour'
REST API Queries#
Query via HTTP:
curl -X POST http://localhost:8086/api/v3/query_sql \
-H "Authorization: Bearer $INFLUXDB_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"db": "dart_cloud_logs",
"q": "SELECT * FROM function_logs WHERE function_id = '\''550e8400-e29b-41d4-a716-446655440000'\'' LIMIT 100"
}'
Response format:
{
"results": [
{
"statement_id": 0,
"series": [
{
"name": "function_logs",
"columns": ["time", "function_id", "execution_id", "status", "duration_ms", "memory_used_mb"],
"values": [
["2024-01-15T10:30:00Z", "550e8400...", "660e8400...", "success", 1234, 45],
["2024-01-15T10:29:00Z", "550e8400...", "660e8400...", "success", 1100, 42]
]
}
]
}
]
}
Grafana Integration#
Setup#
-
Access Grafana:
http://localhost:3000 -
Add InfluxDB Datasource:
- Configuration → Data Sources → Add
- Type: InfluxDB
- URL:
http://dart_cloud_influxdb:8086 - Database:
dart_cloud_logs - HTTP Method: POST
- Auth: Bearer Token (use INFLUXDB_TOKEN)
-
Create Dashboard:
- New Dashboard
- Add panels with SQL queries
- Configure visualizations (graphs, tables, gauges)
Example Dashboard Panels#
Function Execution Timeline:
SELECT time, function_id, duration_ms, status
FROM function_logs
WHERE time > now() - interval '24 hours'
ORDER BY time DESC
Error Rate Gauge:
SELECT
ROUND(100.0 * SUM(CASE WHEN status = 'error' THEN 1 ELSE 0 END) / COUNT(*), 2) as error_rate
FROM function_logs
WHERE time > now() - interval '1 hour'
Memory Usage Trend:
SELECT time, AVG(memory_used_mb) as avg_memory
FROM function_logs
WHERE time > now() - interval '7 days'
GROUP BY time(1h)
ORDER BY time DESC
Management Commands#
InfluxDB Operations#
# View InfluxDB logs
docker-compose logs -f influxdb
# Connect to InfluxDB CLI
docker-compose exec influxdb influx
# List buckets
docker-compose exec influxdb influx bucket list --token $INFLUXDB_TOKEN
# Query logs (Flux)
docker-compose exec influxdb influx query 'from(bucket:"dart_cloud_logs") |> range(start: -1h)'
# Check health
curl http://localhost:8086/health
Telegraf Operations#
# View Telegraf logs
docker-compose logs -f telegraf
# Check configuration
docker-compose exec telegraf cat /etc/telegraf/telegraf.conf
# Test connectivity
docker-compose exec telegraf nc -zv influxdb 8086
Backup and Restore#
# Backup InfluxDB data
docker-compose exec influxdb influx backup /backup
# Restore from backup
docker-compose exec influxdb influx restore /backup
# Export logs to CSV
curl -X POST http://localhost:8086/api/v3/query_sql \
-H "Authorization: Bearer $INFLUXDB_TOKEN" \
-H "Accept: text/csv" \
-d '{
"db": "dart_cloud_logs",
"q": "SELECT * FROM function_logs LIMIT 1000"
}' > logs.csv
Troubleshooting#
Logs Not Appearing in InfluxDB#
Check Telegraf connectivity:
# Test UDP connectivity
docker-compose exec backend nc -zv telegraf 8094
# Check Telegraf logs
docker-compose logs telegraf | grep -i error
# Verify environment variables
docker-compose exec backend env | grep TELEGRAF
Check InfluxDB health:
# Health check
curl http://localhost:8086/health
# Check InfluxDB logs
docker-compose logs influxdb | grep -i error
# Verify token
docker-compose exec influxdb influx auth list --token $INFLUXDB_TOKEN
High Memory Usage#
# Check log retention
docker-compose exec influxdb influx bucket list --token $INFLUXDB_TOKEN
# Set retention policy
docker-compose exec influxdb influx bucket update \
--id <bucket-id> \
--retention 30d \
--token $INFLUXDB_TOKEN
# Delete old logs
curl -X POST http://localhost:8086/api/v3/query_sql \
-H "Authorization: Bearer $INFLUXDB_TOKEN" \
-d '{
"db": "dart_cloud_logs",
"q": "DELETE FROM function_logs WHERE time < now() - interval 30 days"
}'
Connection Timeouts#
# Check network connectivity
docker-compose exec backend ping telegraf
docker-compose exec telegraf ping influxdb
# Verify ports
docker-compose ps
# Check firewall rules
docker network inspect dart_cloud_backend_default
Performance Optimization#
Indexing#
InfluxDB automatically indexes tags. Ensure frequently queried fields are tags:
function_id- Always indexedexecution_id- Always indexedstatus- Always indexed
Retention Policy#
Configure data retention to manage storage:
# Set 30-day retention
docker-compose exec influxdb influx bucket update \
--name dart_cloud_logs \
--retention 30d \
--token $INFLUXDB_TOKEN
Query Optimization#
Use time ranges:
-- Good: Scans only recent data
SELECT * FROM function_logs
WHERE time > now() - interval '24 hours'
-- Bad: Scans all data
SELECT * FROM function_logs
Use tags in WHERE clause:
-- Good: Uses index
WHERE function_id = '550e8400-e29b-41d4-a716-446655440000'
-- Less efficient: Full scan
WHERE message LIKE '%error%'
Security#
Token Management#
# Generate new token
docker-compose exec influxdb influx auth create \
--org dart_cloud \
--description "Dart Cloud Backend" \
--token $INFLUXDB_TOKEN
# Revoke token
docker-compose exec influxdb influx auth delete --id <token-id>
# List active tokens
docker-compose exec influxdb influx auth list
Access Control#
# Create read-only user
docker-compose exec influxdb influx auth create \
--org dart_cloud \
--read-bucket dart_cloud_logs \
--description "Read-only access"
Best Practices#
- Always use time ranges in queries to improve performance
- Index frequently queried fields as tags
- Set appropriate retention policies to manage storage
- Monitor InfluxDB disk usage regularly
- Use Grafana alerts for critical metrics
- Backup logs regularly for compliance
- Rotate tokens periodically for security
- Use read-only tokens for dashboards and external tools