* Initial plan * feat: add comprehensive auto-scaling and load balancing configuration - Add frontend HPA for auto-scaling (2-5 replicas) - Enhance backend HPA with custom metrics support - Improve load balancer configuration with health-based routing - Add advanced traffic management policies - Create AUTO_SCALING.md documentation - Add validation script for auto-scaling setup - Add load testing script for traffic spike simulation - Update Helm production values with enhanced configs Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com> * feat: add monitoring, alerting, and comprehensive documentation - Add Prometheus alerting rules for auto-scaling events - Add ServiceMonitor for metrics collection - Create comprehensive AUTO_SCALING_EXAMPLES.md tutorial - Update DEPLOYMENT.md with auto-scaling references - Update scripts/README.md with new validation tools - Add monitoring for HPA, deployments, and load balancers - Include troubleshooting scenarios and examples Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com> * docs: add comprehensive implementation summary - Create AUTO_SCALING_IMPLEMENTATION.md with complete overview - Document all components, files, and specifications - Include deployment instructions and validation results - Add technical specifications and performance characteristics - Document success criteria achievement Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com> * fix: address code review feedback - Merge duplicate alb.ingress.kubernetes.io/load-balancer-attributes annotation - Fix nginx.ingress.kubernetes.io/limit-burst-multiplier to correct annotation name - Remove unused checks_warned variable from validation script - Fix YAML escape sequence in AUTO_SCALING_EXAMPLES.md Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>
9.4 KiB
Deployment Guide
This document describes the production deployment strategy for Spywatcher, including infrastructure setup, deployment procedures, and rollback strategies.
Table of Contents
- Overview
- Infrastructure Setup
- Deployment Strategies
- Kubernetes Deployment
- Terraform Infrastructure
- Helm Charts
- CI/CD Pipeline
- Rollback Procedures
- Monitoring and Alerts
- Troubleshooting
Related Documentation
- AUTO_SCALING.md - Comprehensive auto-scaling and load balancing guide
- docs/AUTO_SCALING_EXAMPLES.md - Practical examples and tutorials
- INFRASTRUCTURE.md - Infrastructure architecture overview
- MONITORING.md - Monitoring and observability setup
Overview
Spywatcher uses a multi-strategy deployment approach with:
- Infrastructure as Code: Terraform for AWS infrastructure
- Container Orchestration: Kubernetes (EKS) for application deployment
- Package Management: Helm charts for simplified deployments
- Deployment Strategies: Rolling, Blue-Green, and Canary deployments
- CI/CD: GitHub Actions for automated deployments
Infrastructure Setup
Prerequisites
- AWS Account with appropriate permissions
- AWS CLI configured
- kubectl installed
- Terraform installed (>= 1.5.0)
- Helm installed (>= 3.0)
Terraform Infrastructure
The infrastructure is defined in Terraform modules:
cd terraform
# Initialize Terraform
terraform init
# Review the plan
terraform plan -var-file="environments/production/terraform.tfvars"
# Apply infrastructure
terraform apply -var-file="environments/production/terraform.tfvars"
Infrastructure Components
- VPC: Isolated network with public, private, and database subnets across 3 AZs
- EKS Cluster: Kubernetes cluster with managed node groups
- RDS PostgreSQL: Managed database with encryption and automated backups
- ElastiCache Redis: In-memory cache with cluster mode
- Application Load Balancer: With WAF for security
- Security Groups: Least-privilege network access
- IAM Roles: Service accounts and node permissions
Configure kubectl
After infrastructure deployment:
aws eks update-kubeconfig --name spywatcher-production --region us-east-1
kubectl cluster-info
Deployment Strategies
Rolling Deployment (Default)
Updates pods gradually, maintaining service availability.
# Triggered automatically on push to main branch
# Or manually via GitHub Actions UI
Advantages:
- Simple and predictable
- Zero downtime
- Automatic rollback on failure
Disadvantages:
- Gradual rollout may take time
- Both versions run simultaneously during update
Blue-Green Deployment
Maintains two identical environments, switching traffic instantly.
# Via GitHub Actions
# Select "blue-green" as deployment strategy
# Or manually
IMAGE_TAG=latest ./scripts/deployment/blue-green-deploy.sh
# Rollback if needed
./scripts/deployment/blue-green-deploy.sh --rollback
Advantages:
- Instant traffic switch
- Easy rollback
- Full environment testing before switch
Disadvantages:
- Requires double resources temporarily
- Database migrations must be compatible with both versions
Canary Deployment
Gradually shifts traffic to new version while monitoring metrics.
# Via GitHub Actions
# Select "canary" as deployment strategy
# Or manually
IMAGE_TAG=latest CANARY_STEPS="5 25 50 100" ./scripts/deployment/canary-deploy.sh
Advantages:
- Risk mitigation through gradual rollout
- Real-world testing with subset of users
- Automated rollback on errors
Disadvantages:
- Longer deployment time
- Requires robust monitoring
Kubernetes Deployment
Using Kustomize
Deploy to different environments:
# Production
kubectl apply -k k8s/overlays/production
# Staging
kubectl apply -k k8s/overlays/staging
# Development (base)
kubectl apply -k k8s/base
Manual Deployment
# Create namespace
kubectl apply -f k8s/base/namespace.yaml
# Apply configurations
kubectl apply -f k8s/base/configmap.yaml
kubectl apply -f k8s/base/secrets.yaml
# Deploy databases
kubectl apply -f k8s/base/postgres-statefulset.yaml
kubectl apply -f k8s/base/redis-statefulset.yaml
# Deploy applications
kubectl apply -f k8s/base/backend-deployment.yaml
kubectl apply -f k8s/base/frontend-deployment.yaml
# Create services
kubectl apply -f k8s/base/backend-service.yaml
kubectl apply -f k8s/base/frontend-service.yaml
# Configure ingress
kubectl apply -f k8s/base/ingress.yaml
Scaling
# Manual scaling
kubectl scale deployment spywatcher-backend --replicas=5 -n spywatcher
# Auto-scaling is configured via HPA
kubectl get hpa -n spywatcher
Helm Charts
Installation
# Install with default values
helm install spywatcher ./helm/spywatcher -n spywatcher --create-namespace
# Install with custom values
helm install spywatcher ./helm/spywatcher \
-n spywatcher \
--create-namespace \
-f helm/spywatcher/values-production.yaml
Upgrade
helm upgrade spywatcher ./helm/spywatcher -n spywatcher
Rollback
# List releases
helm history spywatcher -n spywatcher
# Rollback to previous version
helm rollback spywatcher -n spywatcher
# Rollback to specific revision
helm rollback spywatcher 2 -n spywatcher
CI/CD Pipeline
GitHub Actions Workflow
The deployment pipeline is triggered by:
- Push to
mainbranch (automatic) - Manual workflow dispatch
Pipeline Steps
-
Build and Push
- Build Docker images for backend and frontend
- Push to GitHub Container Registry
- Tag with commit SHA and latest
-
Database Migration
- Run Prisma migrations
- Verify migration success
-
Deploy
- Apply selected deployment strategy
- Update Kubernetes deployments
- Monitor rollout status
-
Smoke Tests
- Health check endpoints
- Basic functionality tests
-
Rollback on Failure
- Automatic rollback if deployment fails
- Notification to team
Required Secrets
Configure in GitHub repository settings:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
DATABASE_URL
REDIS_URL
JWT_SECRET
JWT_REFRESH_SECRET
DISCORD_BOT_TOKEN
DISCORD_CLIENT_ID
DISCORD_CLIENT_SECRET
SLACK_WEBHOOK (optional)
Rollback Procedures
Kubernetes Rollback
# View rollout history
kubectl rollout history deployment/spywatcher-backend -n spywatcher
# Rollback to previous version
kubectl rollout undo deployment/spywatcher-backend -n spywatcher
# Rollback to specific revision
kubectl rollout undo deployment/spywatcher-backend --to-revision=2 -n spywatcher
# Check rollback status
kubectl rollout status deployment/spywatcher-backend -n spywatcher
Blue-Green Rollback
./scripts/deployment/blue-green-deploy.sh --rollback
Database Rollback
# If migration needs to be rolled back
kubectl exec -it deployment/spywatcher-backend -n spywatcher -- npx prisma migrate resolve --rolled-back <migration_name>
Monitoring and Alerts
Health Checks
# Liveness probe
curl https://api.spywatcher.example.com/health/live
# Readiness probe
curl https://api.spywatcher.example.com/health/ready
Kubernetes Monitoring
# Check pod status
kubectl get pods -n spywatcher
# View pod logs
kubectl logs -f deployment/spywatcher-backend -n spywatcher
# Check events
kubectl get events -n spywatcher --sort-by='.lastTimestamp'
# Resource usage
kubectl top pods -n spywatcher
kubectl top nodes
CloudWatch Metrics
Monitor via AWS CloudWatch:
- EKS cluster metrics
- RDS performance metrics
- ElastiCache metrics
- ALB request metrics
Troubleshooting
Pod Not Starting
# Describe pod to see events
kubectl describe pod <pod-name> -n spywatcher
# Check logs
kubectl logs <pod-name> -n spywatcher
# Check resource constraints
kubectl describe node <node-name>
Database Connection Issues
# Verify database secret
kubectl get secret spywatcher-secrets -n spywatcher -o yaml
# Test database connection
kubectl run -it --rm debug --image=postgres:15-alpine --restart=Never -n spywatcher -- \
psql -h <rds-endpoint> -U spywatcher -d spywatcher
Traffic Not Routing
# Check service endpoints
kubectl get endpoints -n spywatcher
# Check ingress
kubectl describe ingress spywatcher-ingress -n spywatcher
# Check ALB target groups
aws elbv2 describe-target-health --target-group-arn <arn>
High Resource Usage
# Check HPA status
kubectl get hpa -n spywatcher
# Scale manually if needed
kubectl scale deployment spywatcher-backend --replicas=10 -n spywatcher
# Check resource limits
kubectl describe deployment spywatcher-backend -n spywatcher
Best Practices
- Always test in staging first
- Run database migrations before deploying code
- Use feature flags for risky changes
- Monitor error rates during deployment
- Keep rollback scripts ready
- Document all configuration changes
- Regular backup testing
- Security patches applied promptly
Support
For deployment issues:
- Check GitHub Actions logs
- Review CloudWatch logs
- Contact DevOps team
- Create incident in issue tracker