9.1 KiB
Operations (Ops) Directory
This directory contains operational scripts and tools for managing the Internet-ID infrastructure, including database backups, disaster recovery, and SSL/TLS certificate management.
Directory Structure
ops/
├── backup/
│ ├── backup-database.sh # Main backup script (full & incremental)
│ ├── verify-backup.sh # Backup integrity verification
│ └── crontab.example # Example cron configuration
├── restore/
│ └── restore-database.sh # Restore script (full, PITR, partial)
├── ssl/
│ ├── manage-certs.sh # SSL certificate management
│ ├── check-cert-expiry.sh # Certificate expiration monitoring
│ ├── test-ssl-config.sh # SSL/TLS configuration testing
│ ├── certbot-cron # Cron job configuration
│ └── README.md # SSL documentation
├── nginx/
│ ├── nginx.conf # Main Nginx configuration
│ └── conf.d/
│ └── default.conf # HTTPS/SSL reverse proxy config
└── README.md # This file
Quick Start
SSL/TLS Certificate Management
# Initial setup - obtain certificates
cd ops/ssl
export DOMAIN=yourdomain.com
export SSL_EMAIL=admin@yourdomain.com
./manage-certs.sh obtain
# Test SSL configuration
./test-ssl-config.sh
# Check certificate expiration
./check-cert-expiry.sh
Database Backup
# Full backup
cd ops/backup
./backup-database.sh full
# Incremental backup (WAL archiving)
./backup-database.sh incremental
# Verify backups
./verify-backup.sh
Restoring Database
cd ops/restore
# Full restore (from latest backup)
./restore-database.sh full
# Point-in-time recovery
export RESTORE_TARGET_TIME="2025-10-24 18:30:00"
./restore-database.sh pitr
# Partial table restore
export RESTORE_TABLES="Content,PlatformBinding"
./restore-database.sh partial
Scripts Overview
backup-database.sh
Purpose: Perform automated database backups
Features:
- Full backups using
pg_dump(compressed with gzip) - Incremental backups via WAL archiving
- Automatic upload to S3 (if configured)
- Backup metadata and logging
- Automatic cleanup based on retention policy
Usage:
./backup-database.sh [full|incremental]
Environment Variables:
POSTGRES_HOST- Database host (default: localhost)POSTGRES_PORT- Database port (default: 5432)POSTGRES_DB- Database name (default: internetid)POSTGRES_USER- Database user (default: internetid)POSTGRES_PASSWORD- Database passwordBACKUP_DIR- Backup directory (default: /var/lib/postgresql/backups)RETENTION_DAYS- Backup retention period (default: 30)S3_BUCKET- S3 bucket for remote backups (optional)S3_REGION- S3 region (default: us-east-1)
verify-backup.sh
Purpose: Verify backup integrity and report status
Features:
- Check backup age
- Verify backup file integrity (gzip test)
- Monitor backup sizes for anomalies
- Check WAL archiving status
- Monitor storage usage
- Send alerts on failures
- Generate backup status reports
Usage:
./verify-backup.sh
Environment Variables:
BACKUP_DIR- Backup directoryRETENTION_DAYS- Expected retention periodALERT_EMAIL- Email for alerts (optional)
restore-database.sh
Purpose: Restore database from backups
Features:
- Full database restore from backup
- Point-in-time recovery (PITR)
- Partial table restore
- Download from S3 support
- Automatic verification after restore
- Database consistency checks
Usage:
# Full restore
./restore-database.sh full
# PITR
RESTORE_TARGET_TIME="2025-10-24 18:30:00" ./restore-database.sh pitr
# Partial restore
RESTORE_TABLES="Content,User" ./restore-database.sh partial
Environment Variables:
POSTGRES_HOST,POSTGRES_PORT,POSTGRES_DB,POSTGRES_USER,POSTGRES_PASSWORDBACKUP_DIR- Backup directoryBACKUP_FILE- Specific backup file to restore (optional)RESTORE_TARGET_TIME- Target time for PITR (format: YYYY-MM-DD HH:MM:SS)RESTORE_TABLES- Comma-separated list of tables for partial restore
Scheduling Backups
Using Cron (Linux)
-
Copy the example crontab:
sudo cp ops/backup/crontab.example /etc/cron.d/postgres-backup -
Edit to match your environment:
sudo nano /etc/cron.d/postgres-backup -
Restart cron service:
sudo systemctl restart cron
Using Docker Compose
The included docker-compose.yml already has a backup service configured:
# Start with backup service
docker compose up -d
# View backup logs
docker compose logs backup
# Manually trigger backup
docker compose exec backup /opt/backup-scripts/backup-database.sh full
Using Kubernetes CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:16-alpine
command: ["/opt/backup-scripts/backup-database.sh", "full"]
volumeMounts:
- name: backup-scripts
mountPath: /opt/backup-scripts
- name: backup-storage
mountPath: /var/lib/postgresql/backups
volumes:
- name: backup-scripts
configMap:
name: backup-scripts
- name: backup-storage
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: OnFailure
Configuration
Environment Files
Create /etc/backup.env for production:
POSTGRES_HOST=db.production.example.com
POSTGRES_PORT=5432
POSTGRES_DB=internetid
POSTGRES_USER=internetid
POSTGRES_PASSWORD=your_secure_password
BACKUP_DIR=/var/lib/postgresql/backups
RETENTION_DAYS=30
# S3 Configuration
S3_BUCKET=internet-id-backups
S3_REGION=us-east-1
# Alerts
ALERT_EMAIL=ops@example.com
PostgreSQL Configuration
Enable WAL archiving in postgresql.conf:
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/backups/wal_archive/%f && cp %p /var/lib/postgresql/backups/wal_archive/%f'
archive_timeout = 3600
Monitoring
See docs/ops/BACKUP_MONITORING.md for:
- Prometheus metrics
- Grafana dashboards
- CloudWatch integration
- Alert configuration
- Health checks
Security
-
Permissions: Scripts should be owned by
postgresusersudo chown -R postgres:postgres /opt/internet-id/ops sudo chmod 750 /opt/internet-id/ops/backup/*.sh sudo chmod 750 /opt/internet-id/ops/restore/*.sh -
Credentials: Use environment files with restricted permissions
sudo chmod 600 /etc/backup.env -
S3 Access: Use IAM roles instead of access keys when possible
-
Encryption: Enable encryption at rest for S3 buckets
Testing
Test Backup
# Run manual backup
cd ops/backup
./backup-database.sh full
# Verify backup was created
ls -lh /var/lib/postgresql/backups/full/
# Verify backup integrity
./verify-backup.sh
Test Restore
# Create test database
psql -h localhost -U internetid -d postgres -c "CREATE DATABASE test_restore;"
# Restore to test database
export POSTGRES_DB=test_restore
cd ops/restore
./restore-database.sh full
# Verify data
psql -h localhost -U internetid -d test_restore -c "SELECT COUNT(*) FROM Content;"
# Cleanup
psql -h localhost -U internetid -d postgres -c "DROP DATABASE test_restore;"
Troubleshooting
Common Issues
-
Permission Denied
- Ensure scripts are executable:
chmod +x *.sh - Check directory ownership:
chown postgres:postgres /var/lib/postgresql/backups
- Ensure scripts are executable:
-
Cannot Connect to PostgreSQL
- Verify PostgreSQL is running:
systemctl status postgresql - Test connection:
psql -h localhost -U internetid -d internetid -c "SELECT 1;"
- Verify PostgreSQL is running:
-
Disk Space Full
- Check usage:
df -h /var/lib/postgresql/backups - Reduce retention:
export RETENTION_DAYS=15 - Clean old backups:
find /var/lib/postgresql/backups/full -mtime +30 -delete
- Check usage:
-
S3 Upload Fails
- Verify AWS credentials:
aws sts get-caller-identity - Check bucket access:
aws s3 ls s3://internet-id-backups/
- Verify AWS credentials:
Logs
- Backup logs:
/var/lib/postgresql/backups/backup.log - Restore logs:
/var/lib/postgresql/backups/restore.log - Verification logs:
/var/lib/postgresql/backups/verify.log - PostgreSQL logs:
/var/log/postgresql/postgresql-16-main.log
Documentation
- Database Backup & Recovery Guide - Complete setup and usage
- Disaster Recovery Runbook - Emergency procedures
- Backup Monitoring - Monitoring and alerting setup
Support
For issues or questions:
- Check the troubleshooting section above
- Review logs in
/var/lib/postgresql/backups/ - Consult documentation in
docs/ops/ - Open an issue on GitHub
License
Same as parent project (Internet-ID)