Backup and Disaster Recovery Strategy#
Complete guide to backing up, replicating, and recovering data in the ContainerPub serverless platform.
📋 Overview#
The ContainerPub backup system provides comprehensive data protection through:
- Database Backups - PostgreSQL database dumps with compression
- Volume Backups - Docker volume snapshots
- Volume Replication - Real-time or scheduled replication to remote locations
- Automated Scheduling - Cron-based backup automation
- Disaster Recovery - Complete system restore procedures
🏗️ System Architecture#
┌─────────────────────────────────────────────┐
│ ContainerPub Backend Stack │
├─────────────────────────────────────────────┤
│ PostgreSQL Container │ Backend Container │
│ ↓ postgres_data │ ↓ functions_data │
└──────────┬──────────────┴──────────┬────────┘
│ │
┌──────────▼─────────────────────────▼────────┐
│ Backup System Layer │
│ • Database Backup (pg_dump) │
│ • Volume Backup (tar archive) │
│ • Combined Archives │
└──────────┬──────────────────────────────────┘
│
┌──────────▼──────────────────────────────────┐
│ Replication Layer (Optional) │
│ Local Storage │ Rsync │ S3 │ Custom │
└─────────────────────────────────────────────┘
🚀 Quick Start#
Prerequisites#
- Docker and Docker Compose installed
- ContainerPub backend stack running
- Proper permissions to execute scripts
Setup#
# Navigate to backup directory
cd dart_cloud_backend/deploy/backups
# Make scripts executable
chmod +x *.sh
📦 Manual Backups#
Complete Backup (Recommended)#
Backup everything - databases and volumes:
./backup-all.sh
Output:
- Database backups (main + functions)
- Volume backups (PostgreSQL + functions data)
- Combined archive
- Backup manifest with restore instructions
Database Only#
./backup-database.sh
Creates:
data/dart_cloud_TIMESTAMP.sql.gz- Main databasedata/functions_db_TIMESTAMP.sql.gz- Functions databasedata/full_backup_TIMESTAMP.tar.gz- Combined archivedata/backup_TIMESTAMP.meta- Metadata
Volumes Only#
./backup-volumes.sh
Creates:
data/volumes/postgres_volume_TIMESTAMP.tar.gzdata/volumes/functions_volume_TIMESTAMP.tar.gzdata/volumes/volumes_backup_TIMESTAMP.tar.gzdata/volumes/volume_backup_TIMESTAMP.meta
🔄 Restore Procedures#
List Available Backups#
# List all backups
ls -lh data/*.tar.gz
# View backup metadata
cat data/backup_TIMESTAMP.meta
Complete System Restore#
# 1. Stop services
cd ../../
docker-compose down
# 2. Restore volumes first
cd deploy/backups
./restore-volumes.sh -f data/volumes/volumes_backup_TIMESTAMP.tar.gz -y
# 3. Start services
cd ../../
docker-compose up -d
# 4. Wait for database to be healthy
docker-compose ps
# 5. Restore databases
cd deploy/backups
./restore-database.sh -f data/full_backup_TIMESTAMP.tar.gz -y
Restore Database Only#
# Restore both databases
./restore-database.sh -f data/full_backup_TIMESTAMP.tar.gz
# Restore main database only
./restore-database.sh -f data/dart_cloud_TIMESTAMP.sql.gz -d main
# Restore functions database only
./restore-database.sh -f data/functions_db_TIMESTAMP.sql.gz -d functions
# Skip confirmation prompt
./restore-database.sh -f data/full_backup_TIMESTAMP.tar.gz -y
Restore Volumes Only#
# Restore all volumes
./restore-volumes.sh -f data/volumes/volumes_backup_TIMESTAMP.tar.gz
# Restore PostgreSQL volume only
./restore-volumes.sh -f data/volumes/postgres_volume_TIMESTAMP.tar.gz -v postgres
# Restore functions volume only
./restore-volumes.sh -f data/volumes/functions_volume_TIMESTAMP.tar.gz -v functions
🤖 Automated Backups#
Setup Automated Backup Service#
- Configure backup schedule in
deploy/.env:
# Backup schedule (cron format)
BACKUP_SCHEDULE="0 2 * * *" # Daily at 2 AM
# Retention periods
BACKUP_RETENTION_DAYS=7
VOLUME_RETENTION_DAYS=7
- Start the backup service:
cd dart_cloud_backend/deploy
docker-compose -f docker-compose.yml -f backups/docker-compose.backup.yml up -d
- Verify service is running:
docker ps | grep backup
docker logs dart_cloud_backup_service
Common Cron Schedules#
# Every day at 2 AM
BACKUP_SCHEDULE="0 2 * * *"
# Every 6 hours
BACKUP_SCHEDULE="0 */6 * * *"
# Every Sunday at 3 AM
BACKUP_SCHEDULE="0 3 * * 0"
# Every day at midnight and noon
BACKUP_SCHEDULE="0 0,12 * * *"
# Every 4 hours (production recommended)
BACKUP_SCHEDULE="0 */4 * * *"
Stop Backup Service#
docker-compose -f docker-compose.yml -f backups/docker-compose.backup.yml down
🔁 Volume Replication#
Volume replication provides geographic redundancy and disaster recovery capabilities.
Local Replication#
Copy backups to a local directory:
# One-time replication
./replicate-volumes.sh -t local -d /mnt/backup
# Continuous replication (every hour)
./replicate-volumes.sh -t local -d /mnt/backup -c -i 3600
Remote Replication (Rsync)#
Sync to a remote server via SSH:
# Setup SSH key authentication first
ssh-copy-id user@backup-server
# One-time replication
./replicate-volumes.sh -t rsync -d user@backup-server:/backup/volumes
# Continuous replication (every 2 hours)
./replicate-volumes.sh -t rsync -d user@backup-server:/backup/volumes -c -i 7200
S3 Replication#
Upload to S3-compatible storage (AWS S3, Cloudflare R2, MinIO):
# Configure AWS CLI first
aws configure
# One-time replication
./replicate-volumes.sh -t s3 -d s3://my-bucket/backups/volumes
# Continuous replication (every 4 hours)
./replicate-volumes.sh -t s3 -d s3://my-bucket/backups/volumes -c -i 14400
Custom Replication#
Use custom replication commands:
# Set custom command
export REPLICATION_COMMAND="scp"
# Replicate
./replicate-volumes.sh -t custom -d user@server:/backup
Enable Automated Replication#
Configure in deploy/.env:
# Enable replication
REPLICATION_ENABLED=true
# Replication type
REPLICATION_TYPE=rsync
# Destination
REPLICATION_TARGET=user@backup-server:/backup/volumes
# Interval (seconds)
REPLICATION_INTERVAL=3600
📊 Backup Strategy Recommendations#
Development Environment#
Frequency: Daily
Retention: 7 days
Replication: Not required
Schedule: "0 2 * * *"
Staging Environment#
Frequency: Every 6 hours
Retention: 14 days
Replication: Optional (local)
Schedule: "0 */6 * * *"
Production Environment#
Frequency: Every 4 hours
Retention: 30 days
Replication: Required (off-site)
Schedule: "0 */4 * * *"
Testing: Weekly restore tests
🔐 Security Best Practices#
1. Encrypt Backups#
# Encrypt backup
gpg --symmetric --cipher-algo AES256 backup.tar.gz
# Decrypt backup
gpg --decrypt backup.tar.gz.gpg > backup.tar.gz
2. Secure Storage#
- Store backups in secure locations with restricted access
- Use encrypted filesystems for backup storage
- Implement access control lists (ACLs)
3. Credential Management#
- Never commit credentials to version control
- Use environment variables for sensitive data
- Rotate credentials regularly
- Use separate credentials for backup operations
4. Access Control#
# Set proper permissions on backup directory
chmod 700 deploy/backups/data
chown backup-user:backup-group deploy/backups/data
# Restrict script execution
chmod 750 deploy/backups/*.sh
5. Network Security#
- Use SSH keys for remote replication
- Enable firewall rules for backup traffic
- Use VPN for sensitive data transfers
- Implement rate limiting
🔍 Monitoring and Verification#
Check Backup Status#
# List all backups
ls -lh data/
# View latest backup metadata
cat data/backup_*.meta | tail -n 50
# Check backup integrity
tar -tzf data/full_backup_TIMESTAMP.tar.gz
# Verify database backup
gunzip -c data/dart_cloud_TIMESTAMP.sql.gz | head -n 50
Monitor Disk Usage#
# Check backup directory size
du -sh data/
# List largest backups
du -h data/*.tar.gz | sort -rh | head -n 10
# Check available disk space
df -h
Automated Monitoring#
# View backup service logs
docker logs dart_cloud_backup_service
# Follow logs in real-time
docker logs -f dart_cloud_backup_service
# Check service health
docker inspect dart_cloud_backup_service | grep Health
Backup Verification Script#
#!/bin/bash
# verify-backups.sh
BACKUP_DIR="data"
LATEST_BACKUP=$(ls -t ${BACKUP_DIR}/full_backup_*.tar.gz | head -n 1)
echo "Verifying backup: ${LATEST_BACKUP}"
# Check file exists
if [ ! -f "${LATEST_BACKUP}" ]; then
echo "ERROR: Backup file not found"
exit 1
fi
# Check file size (should be > 1MB)
SIZE=$(stat -f%z "${LATEST_BACKUP}")
if [ ${SIZE} -lt 1048576 ]; then
echo "ERROR: Backup file too small"
exit 1
fi
# Verify archive integrity
if tar -tzf "${LATEST_BACKUP}" > /dev/null 2>&1; then
echo "SUCCESS: Backup verified"
exit 0
else
echo "ERROR: Backup corrupted"
exit 1
fi
🛠️ Troubleshooting#
Backup Fails#
Check container status:
docker ps | grep postgres
docker ps | grep backend
Check database connection:
docker exec dart_cloud_postgres psql -U dart_cloud -c "SELECT version();"
Verify credentials:
# Check .env file
cat deploy/.env | grep POSTGRES
# Test connection
docker exec -e PGPASSWORD="${POSTGRES_PASSWORD}" dart_cloud_postgres \
psql -U dart_cloud -d dart_cloud -c "SELECT 1;"
Check disk space:
df -h
Restore Fails#
Verify backup integrity:
# Test archive
tar -tzf data/full_backup_TIMESTAMP.tar.gz
# Test database dump
gunzip -t data/dart_cloud_TIMESTAMP.sql.gz
Check permissions:
ls -l data/
Ensure containers are running:
docker-compose ps
Replication Issues#
Test SSH connection (rsync):
ssh user@backup-server "echo 'Connection successful'"
Test S3 access:
aws s3 ls s3://my-bucket/
Check network connectivity:
ping backup-server
traceroute backup-server
📈 Performance Optimization#
Backup Performance#
| Component | Typical Time | Size | Impact |
|---|---|---|---|
| Database Backup | 1-5 min | 10-100 MB | Low |
| Volume Backup | 2-10 min | 100 MB - 10 GB | Medium |
| Compression | 1-3 min | -70% size | Medium |
| Replication | Varies | N/A | Low-High |
Optimization Tips#
-
Schedule During Low Traffic
- Run backups during off-peak hours
- Minimize impact on production
-
Parallel Operations
# Backup database and volumes concurrently ./backup-database.sh & ./backup-volumes.sh & wait -
Compression Tuning
# Adjust compression level (1-9) export COMPRESSION_LEVEL=6 # Balance speed vs. size -
Incremental Backups
- Consider incremental volume backups for large datasets
- Reduces backup time and storage
📋 Backup Checklist#
Daily Tasks#
- Verify automated backups completed
- Check backup service logs
- Monitor disk space usage
Weekly Tasks#
- Review backup sizes and trends
- Test restore procedure (non-production)
- Verify replication status
- Clean up old backups manually if needed
Monthly Tasks#
- Perform full disaster recovery test
- Review and update backup strategy
- Audit backup access logs
- Update documentation
Quarterly Tasks#
- Review retention policies
- Test off-site restore
- Update backup scripts
- Train team on recovery procedures
🎯 3-2-1 Backup Rule#
Follow the industry-standard 3-2-1 backup rule:
-
3 copies of data
- Production database
- Local backup
- Replicated backup
-
2 different storage types
- Docker volumes (production)
- Compressed archives (backup)
-
1 off-site copy
- Remote server (rsync)
- Cloud storage (S3)
- Geographic redundancy
🚨 Disaster Recovery#
Recovery Time Objective (RTO)#
Target: < 1 hour
Typical: 15-30 minutes
Recovery Point Objective (RPO)#
Target: < 4 hours
Actual: Based on backup schedule
Recovery Procedures#
-
Assess the situation
- Identify what data is lost
- Determine last known good backup
-
Communicate
- Notify stakeholders
- Document the incident
-
Execute recovery
- Follow restore procedures
- Verify data integrity
-
Validate
- Test system functionality
- Verify data completeness
-
Post-mortem
- Document lessons learned
- Update procedures
📚 Configuration Reference#
Environment Variables#
# Database Configuration
POSTGRES_USER=dart_cloud
POSTGRES_PASSWORD=secure_password
POSTGRES_DB=dart_cloud
FUNCTION_DB=functions_db
POSTGRES_HOST=postgres
POSTGRES_CONTAINER=dart_cloud_postgres
# Volume Configuration
POSTGRES_VOLUME=dart_cloud_backend_postgres_data
FUNCTIONS_VOLUME=dart_cloud_backend_functions_data
# Backup Configuration
BACKUP_RETENTION_DAYS=7
VOLUME_RETENTION_DAYS=7
BACKUP_SCHEDULE="0 2 * * *"
# Replication Configuration
REPLICATION_ENABLED=false
REPLICATION_TYPE=local
REPLICATION_TARGET=/mnt/backup
REPLICATION_INTERVAL=3600
Script Options#
backup-database.sh:
BACKUP_RETENTION_DAYS=7 ./backup-database.sh
restore-database.sh:
./restore-database.sh -f FILE [-d main|functions|both] [-y]
backup-volumes.sh:
VOLUME_RETENTION_DAYS=7 ./backup-volumes.sh
restore-volumes.sh:
./restore-volumes.sh -f FILE [-v postgres|functions|both] [-y]
replicate-volumes.sh:
./replicate-volumes.sh -t TYPE -d DEST [-v VOLUME] [-c] [-i INTERVAL]
🔗 Additional Resources#
📞 Support#
For backup-related issues:
- Check the Troubleshooting section
- Review backup logs in
data/*.log - Check Docker logs:
docker-compose logs - Consult the Architecture Documentation
Last Updated: November 2025
Version: 1.0.0