LogoContainerPub

InfluxDB Logging System#

Overview#

The Dart Cloud backend integrates with InfluxDB 3 Core for centralized function execution logging, metrics collection, and monitoring. This system provides real-time insights into function performance, errors, and resource usage.

Architecture#

┌─────────────────────────────────────────────────────────┐
│         Cloud Function Execution                        │
│                                                         │
│  ┌──────────────────────────────────────────────────┐  │
│  │  CloudLogger (embedded in function)              │  │
│  │  - Captures stdout/stderr                        │  │
│  │  - Records execution metrics                     │  │
│  │  - Tracks errors and exceptions                 │  │
│  └──────────────────────────────────────────────────┘  │
│         │                              │                │
│         ├─ UdpTransport ──────────────┤                │
│         │  (UDP:8094)                 │                │
│         │                             │                │
│         └─ FileTransport ─────────────┤                │
│            (./logs.json)              │                │
└─────────────────────────────────────────────────────────┘
         │                              │
         ▼                              ▼
    ┌─────────────┐            ┌──────────────┐
    │  Telegraf   │            │  File Backup │
    │ (UDP:8094)  │            │  (logs.json) │
    └─────────────┘            └──────────────┘
         │
         ▼
    ┌─────────────────────────┐
    │  InfluxDB 3 Core        │
    │  (dart_cloud_logs)      │
    └─────────────────────────┘
         │
         ├─ Grafana Dashboards
         ├─ REST API Queries
         └─ CLI Tools

Components#

CloudLogger#

Embedded logging library that runs inside deployed functions and captures:

  • Function Output: stdout and stderr streams
  • Execution Metrics: Duration, memory usage, CPU time
  • Error Tracking: Exceptions, stack traces, error codes
  • Metadata: Function ID, execution ID, timestamps

Telegraf#

Log aggregation service that:

  • Listens on UDP port 8094
  • Receives logs from CloudLogger
  • Forwards to InfluxDB
  • Provides backup file storage

InfluxDB 3 Core#

Time-series database that:

  • Stores all function execution logs
  • Provides SQL query interface
  • Supports real-time data ingestion
  • Enables long-term metrics analysis

Grafana (Optional)#

Visualization platform for:

  • Real-time dashboards
  • Historical trend analysis
  • Alert configuration
  • Custom metric visualization

Environment Configuration#

Backend Service#

# InfluxDB Connection
INFLUXDB_URL=http://dart_cloud_influxdb:8086
INFLUXDB_TOKEN=<your-secure-token>
INFLUXDB_ORG=dart_cloud
INFLUXDB_BUCKET=dart_cloud_logs

# Telegraf Configuration
TELEGRAF_HOST=telegraf
TELEGRAF_PORT=8094

# Logging
LOG_FILE_PATH=./logs.json
LOG_KEEP_FILE_BACKUP=true
LOG_ENABLE_CONSOLE=true
LOG_MEASUREMENT=function_logs

Docker Compose Services#

The docker-compose.yml includes:

services:
  influxdb:
    image: influxdb:3-core
    ports:
      - "8086:8086"
    environment:
      INFLUXDB_DB: dart_cloud_logs
      INFLUXDB_ADMIN_USER: admin
      INFLUXDB_ADMIN_PASSWORD: ${INFLUXDB_PASSWORD}

  telegraf:
    image: telegraf:1.37.1-alpine
    ports:
      - "8094:8094/udp"
    volumes:
      - ./deploy/config/telegraf.conf:/etc/telegraf/telegraf.conf:ro
    depends_on:
      - influxdb

  grafana:
    image: grafana/grafana:main
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
    depends_on:
      - influxdb

Log Data Structure#

Measurement Schema#

Logs are stored as time-series data with the following structure:

{
  "measurement": "function_logs",
  "tags": {
    "function_id": "550e8400-e29b-41d4-a716-446655440000",
    "execution_id": "660e8400-e29b-41d4-a716-446655440000",
    "status": "success",
    "version": "1"
  },
  "fields": {
    "duration_ms": 1234,
    "memory_used_mb": 45,
    "exit_code": 0,
    "cpu_time_ms": 800,
    "message": "Function executed successfully"
  },
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Tag Fields (Indexed)#

  • function_id: UUID of the deployed function
  • execution_id: Unique ID for this execution
  • status: success, error, timeout, failed
  • version: Function version number

Field Values (Queryable)#

  • duration_ms: Total execution time in milliseconds
  • memory_used_mb: Peak memory usage
  • exit_code: Container exit code (0 = success, -1 = timeout)
  • cpu_time_ms: CPU time consumed
  • message: Log message or error description

Querying Logs#

SQL Query Examples#

Get recent function logs:

SELECT * FROM function_logs 
WHERE function_id = '550e8400-e29b-41d4-a716-446655440000'
ORDER BY time DESC 
LIMIT 100

Get error logs:

SELECT * FROM function_logs 
WHERE status = 'error'
AND time > now() - interval '24 hours'
ORDER BY time DESC

Get performance metrics:

SELECT 
  function_id,
  AVG(duration_ms) as avg_duration,
  MAX(duration_ms) as max_duration,
  AVG(memory_used_mb) as avg_memory
FROM function_logs
WHERE time > now() - interval '7 days'
GROUP BY function_id
ORDER BY avg_duration DESC

Get execution statistics:

SELECT 
  COUNT(*) as total_executions,
  SUM(CASE WHEN status = 'success' THEN 1 ELSE 0 END) as successful,
  SUM(CASE WHEN status = 'error' THEN 1 ELSE 0 END) as failed,
  SUM(CASE WHEN status = 'timeout' THEN 1 ELSE 0 END) as timeouts
FROM function_logs
WHERE time > now() - interval '1 hour'

REST API Queries#

Query via HTTP:

curl -X POST http://localhost:8086/api/v3/query_sql \
  -H "Authorization: Bearer $INFLUXDB_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "db": "dart_cloud_logs",
    "q": "SELECT * FROM function_logs WHERE function_id = '\''550e8400-e29b-41d4-a716-446655440000'\'' LIMIT 100"
  }'

Response format:

{
  "results": [
    {
      "statement_id": 0,
      "series": [
        {
          "name": "function_logs",
          "columns": ["time", "function_id", "execution_id", "status", "duration_ms", "memory_used_mb"],
          "values": [
            ["2024-01-15T10:30:00Z", "550e8400...", "660e8400...", "success", 1234, 45],
            ["2024-01-15T10:29:00Z", "550e8400...", "660e8400...", "success", 1100, 42]
          ]
        }
      ]
    }
  ]
}

Grafana Integration#

Setup#

  1. Access Grafana:

    http://localhost:3000
    
  2. Add InfluxDB Datasource:

    • Configuration → Data Sources → Add
    • Type: InfluxDB
    • URL: http://dart_cloud_influxdb:8086
    • Database: dart_cloud_logs
    • HTTP Method: POST
    • Auth: Bearer Token (use INFLUXDB_TOKEN)
  3. Create Dashboard:

    • New Dashboard
    • Add panels with SQL queries
    • Configure visualizations (graphs, tables, gauges)

Example Dashboard Panels#

Function Execution Timeline:

SELECT time, function_id, duration_ms, status
FROM function_logs
WHERE time > now() - interval '24 hours'
ORDER BY time DESC

Error Rate Gauge:

SELECT 
  ROUND(100.0 * SUM(CASE WHEN status = 'error' THEN 1 ELSE 0 END) / COUNT(*), 2) as error_rate
FROM function_logs
WHERE time > now() - interval '1 hour'

Memory Usage Trend:

SELECT time, AVG(memory_used_mb) as avg_memory
FROM function_logs
WHERE time > now() - interval '7 days'
GROUP BY time(1h)
ORDER BY time DESC

Management Commands#

InfluxDB Operations#

# View InfluxDB logs
docker-compose logs -f influxdb

# Connect to InfluxDB CLI
docker-compose exec influxdb influx

# List buckets
docker-compose exec influxdb influx bucket list --token $INFLUXDB_TOKEN

# Query logs (Flux)
docker-compose exec influxdb influx query 'from(bucket:"dart_cloud_logs") |> range(start: -1h)'

# Check health
curl http://localhost:8086/health

Telegraf Operations#

# View Telegraf logs
docker-compose logs -f telegraf

# Check configuration
docker-compose exec telegraf cat /etc/telegraf/telegraf.conf

# Test connectivity
docker-compose exec telegraf nc -zv influxdb 8086

Backup and Restore#

# Backup InfluxDB data
docker-compose exec influxdb influx backup /backup

# Restore from backup
docker-compose exec influxdb influx restore /backup

# Export logs to CSV
curl -X POST http://localhost:8086/api/v3/query_sql \
  -H "Authorization: Bearer $INFLUXDB_TOKEN" \
  -H "Accept: text/csv" \
  -d '{
    "db": "dart_cloud_logs",
    "q": "SELECT * FROM function_logs LIMIT 1000"
  }' > logs.csv

Troubleshooting#

Logs Not Appearing in InfluxDB#

Check Telegraf connectivity:

# Test UDP connectivity
docker-compose exec backend nc -zv telegraf 8094

# Check Telegraf logs
docker-compose logs telegraf | grep -i error

# Verify environment variables
docker-compose exec backend env | grep TELEGRAF

Check InfluxDB health:

# Health check
curl http://localhost:8086/health

# Check InfluxDB logs
docker-compose logs influxdb | grep -i error

# Verify token
docker-compose exec influxdb influx auth list --token $INFLUXDB_TOKEN

High Memory Usage#

# Check log retention
docker-compose exec influxdb influx bucket list --token $INFLUXDB_TOKEN

# Set retention policy
docker-compose exec influxdb influx bucket update \
  --id <bucket-id> \
  --retention 30d \
  --token $INFLUXDB_TOKEN

# Delete old logs
curl -X POST http://localhost:8086/api/v3/query_sql \
  -H "Authorization: Bearer $INFLUXDB_TOKEN" \
  -d '{
    "db": "dart_cloud_logs",
    "q": "DELETE FROM function_logs WHERE time < now() - interval 30 days"
  }'

Connection Timeouts#

# Check network connectivity
docker-compose exec backend ping telegraf
docker-compose exec telegraf ping influxdb

# Verify ports
docker-compose ps

# Check firewall rules
docker network inspect dart_cloud_backend_default

Performance Optimization#

Indexing#

InfluxDB automatically indexes tags. Ensure frequently queried fields are tags:

  • function_id - Always indexed
  • execution_id - Always indexed
  • status - Always indexed

Retention Policy#

Configure data retention to manage storage:

# Set 30-day retention
docker-compose exec influxdb influx bucket update \
  --name dart_cloud_logs \
  --retention 30d \
  --token $INFLUXDB_TOKEN

Query Optimization#

Use time ranges:

-- Good: Scans only recent data
SELECT * FROM function_logs 
WHERE time > now() - interval '24 hours'

-- Bad: Scans all data
SELECT * FROM function_logs

Use tags in WHERE clause:

-- Good: Uses index
WHERE function_id = '550e8400-e29b-41d4-a716-446655440000'

-- Less efficient: Full scan
WHERE message LIKE '%error%'

Security#

Token Management#

# Generate new token
docker-compose exec influxdb influx auth create \
  --org dart_cloud \
  --description "Dart Cloud Backend" \
  --token $INFLUXDB_TOKEN

# Revoke token
docker-compose exec influxdb influx auth delete --id <token-id>

# List active tokens
docker-compose exec influxdb influx auth list

Access Control#

# Create read-only user
docker-compose exec influxdb influx auth create \
  --org dart_cloud \
  --read-bucket dart_cloud_logs \
  --description "Read-only access"

Best Practices#

  1. Always use time ranges in queries to improve performance
  2. Index frequently queried fields as tags
  3. Set appropriate retention policies to manage storage
  4. Monitor InfluxDB disk usage regularly
  5. Use Grafana alerts for critical metrics
  6. Backup logs regularly for compliance
  7. Rotate tokens periodically for security
  8. Use read-only tokens for dashboards and external tools