Files
discord-spywatcher/DEPLOYMENT.md
Copilot dd0bec5046 Implement production-ready auto-scaling and load balancing infrastructure (#146)
* Initial plan

* feat: add comprehensive auto-scaling and load balancing configuration

- Add frontend HPA for auto-scaling (2-5 replicas)
- Enhance backend HPA with custom metrics support
- Improve load balancer configuration with health-based routing
- Add advanced traffic management policies
- Create AUTO_SCALING.md documentation
- Add validation script for auto-scaling setup
- Add load testing script for traffic spike simulation
- Update Helm production values with enhanced configs

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* feat: add monitoring, alerting, and comprehensive documentation

- Add Prometheus alerting rules for auto-scaling events
- Add ServiceMonitor for metrics collection
- Create comprehensive AUTO_SCALING_EXAMPLES.md tutorial
- Update DEPLOYMENT.md with auto-scaling references
- Update scripts/README.md with new validation tools
- Add monitoring for HPA, deployments, and load balancers
- Include troubleshooting scenarios and examples

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* docs: add comprehensive implementation summary

- Create AUTO_SCALING_IMPLEMENTATION.md with complete overview
- Document all components, files, and specifications
- Include deployment instructions and validation results
- Add technical specifications and performance characteristics
- Document success criteria achievement

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* fix: address code review feedback

- Merge duplicate alb.ingress.kubernetes.io/load-balancer-attributes annotation
- Fix nginx.ingress.kubernetes.io/limit-burst-multiplier to correct annotation name
- Remove unused checks_warned variable from validation script
- Fix YAML escape sequence in AUTO_SCALING_EXAMPLES.md

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>
2025-11-02 18:03:58 -06:00

9.4 KiB

Deployment Guide

This document describes the production deployment strategy for Spywatcher, including infrastructure setup, deployment procedures, and rollback strategies.

Table of Contents

Overview

Spywatcher uses a multi-strategy deployment approach with:

  • Infrastructure as Code: Terraform for AWS infrastructure
  • Container Orchestration: Kubernetes (EKS) for application deployment
  • Package Management: Helm charts for simplified deployments
  • Deployment Strategies: Rolling, Blue-Green, and Canary deployments
  • CI/CD: GitHub Actions for automated deployments

Infrastructure Setup

Prerequisites

  1. AWS Account with appropriate permissions
  2. AWS CLI configured
  3. kubectl installed
  4. Terraform installed (>= 1.5.0)
  5. Helm installed (>= 3.0)

Terraform Infrastructure

The infrastructure is defined in Terraform modules:

cd terraform

# Initialize Terraform
terraform init

# Review the plan
terraform plan -var-file="environments/production/terraform.tfvars"

# Apply infrastructure
terraform apply -var-file="environments/production/terraform.tfvars"

Infrastructure Components

  • VPC: Isolated network with public, private, and database subnets across 3 AZs
  • EKS Cluster: Kubernetes cluster with managed node groups
  • RDS PostgreSQL: Managed database with encryption and automated backups
  • ElastiCache Redis: In-memory cache with cluster mode
  • Application Load Balancer: With WAF for security
  • Security Groups: Least-privilege network access
  • IAM Roles: Service accounts and node permissions

Configure kubectl

After infrastructure deployment:

aws eks update-kubeconfig --name spywatcher-production --region us-east-1
kubectl cluster-info

Deployment Strategies

Rolling Deployment (Default)

Updates pods gradually, maintaining service availability.

# Triggered automatically on push to main branch
# Or manually via GitHub Actions UI

Advantages:

  • Simple and predictable
  • Zero downtime
  • Automatic rollback on failure

Disadvantages:

  • Gradual rollout may take time
  • Both versions run simultaneously during update

Blue-Green Deployment

Maintains two identical environments, switching traffic instantly.

# Via GitHub Actions
# Select "blue-green" as deployment strategy

# Or manually
IMAGE_TAG=latest ./scripts/deployment/blue-green-deploy.sh

# Rollback if needed
./scripts/deployment/blue-green-deploy.sh --rollback

Advantages:

  • Instant traffic switch
  • Easy rollback
  • Full environment testing before switch

Disadvantages:

  • Requires double resources temporarily
  • Database migrations must be compatible with both versions

Canary Deployment

Gradually shifts traffic to new version while monitoring metrics.

# Via GitHub Actions
# Select "canary" as deployment strategy

# Or manually
IMAGE_TAG=latest CANARY_STEPS="5 25 50 100" ./scripts/deployment/canary-deploy.sh

Advantages:

  • Risk mitigation through gradual rollout
  • Real-world testing with subset of users
  • Automated rollback on errors

Disadvantages:

  • Longer deployment time
  • Requires robust monitoring

Kubernetes Deployment

Using Kustomize

Deploy to different environments:

# Production
kubectl apply -k k8s/overlays/production

# Staging
kubectl apply -k k8s/overlays/staging

# Development (base)
kubectl apply -k k8s/base

Manual Deployment

# Create namespace
kubectl apply -f k8s/base/namespace.yaml

# Apply configurations
kubectl apply -f k8s/base/configmap.yaml
kubectl apply -f k8s/base/secrets.yaml

# Deploy databases
kubectl apply -f k8s/base/postgres-statefulset.yaml
kubectl apply -f k8s/base/redis-statefulset.yaml

# Deploy applications
kubectl apply -f k8s/base/backend-deployment.yaml
kubectl apply -f k8s/base/frontend-deployment.yaml

# Create services
kubectl apply -f k8s/base/backend-service.yaml
kubectl apply -f k8s/base/frontend-service.yaml

# Configure ingress
kubectl apply -f k8s/base/ingress.yaml

Scaling

# Manual scaling
kubectl scale deployment spywatcher-backend --replicas=5 -n spywatcher

# Auto-scaling is configured via HPA
kubectl get hpa -n spywatcher

Helm Charts

Installation

# Install with default values
helm install spywatcher ./helm/spywatcher -n spywatcher --create-namespace

# Install with custom values
helm install spywatcher ./helm/spywatcher \
  -n spywatcher \
  --create-namespace \
  -f helm/spywatcher/values-production.yaml

Upgrade

helm upgrade spywatcher ./helm/spywatcher -n spywatcher

Rollback

# List releases
helm history spywatcher -n spywatcher

# Rollback to previous version
helm rollback spywatcher -n spywatcher

# Rollback to specific revision
helm rollback spywatcher 2 -n spywatcher

CI/CD Pipeline

GitHub Actions Workflow

The deployment pipeline is triggered by:

  1. Push to main branch (automatic)
  2. Manual workflow dispatch

Pipeline Steps

  1. Build and Push

    • Build Docker images for backend and frontend
    • Push to GitHub Container Registry
    • Tag with commit SHA and latest
  2. Database Migration

    • Run Prisma migrations
    • Verify migration success
  3. Deploy

    • Apply selected deployment strategy
    • Update Kubernetes deployments
    • Monitor rollout status
  4. Smoke Tests

    • Health check endpoints
    • Basic functionality tests
  5. Rollback on Failure

    • Automatic rollback if deployment fails
    • Notification to team

Required Secrets

Configure in GitHub repository settings:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
DATABASE_URL
REDIS_URL
JWT_SECRET
JWT_REFRESH_SECRET
DISCORD_BOT_TOKEN
DISCORD_CLIENT_ID
DISCORD_CLIENT_SECRET
SLACK_WEBHOOK (optional)

Rollback Procedures

Kubernetes Rollback

# View rollout history
kubectl rollout history deployment/spywatcher-backend -n spywatcher

# Rollback to previous version
kubectl rollout undo deployment/spywatcher-backend -n spywatcher

# Rollback to specific revision
kubectl rollout undo deployment/spywatcher-backend --to-revision=2 -n spywatcher

# Check rollback status
kubectl rollout status deployment/spywatcher-backend -n spywatcher

Blue-Green Rollback

./scripts/deployment/blue-green-deploy.sh --rollback

Database Rollback

# If migration needs to be rolled back
kubectl exec -it deployment/spywatcher-backend -n spywatcher -- npx prisma migrate resolve --rolled-back <migration_name>

Monitoring and Alerts

Health Checks

# Liveness probe
curl https://api.spywatcher.example.com/health/live

# Readiness probe
curl https://api.spywatcher.example.com/health/ready

Kubernetes Monitoring

# Check pod status
kubectl get pods -n spywatcher

# View pod logs
kubectl logs -f deployment/spywatcher-backend -n spywatcher

# Check events
kubectl get events -n spywatcher --sort-by='.lastTimestamp'

# Resource usage
kubectl top pods -n spywatcher
kubectl top nodes

CloudWatch Metrics

Monitor via AWS CloudWatch:

  • EKS cluster metrics
  • RDS performance metrics
  • ElastiCache metrics
  • ALB request metrics

Troubleshooting

Pod Not Starting

# Describe pod to see events
kubectl describe pod <pod-name> -n spywatcher

# Check logs
kubectl logs <pod-name> -n spywatcher

# Check resource constraints
kubectl describe node <node-name>

Database Connection Issues

# Verify database secret
kubectl get secret spywatcher-secrets -n spywatcher -o yaml

# Test database connection
kubectl run -it --rm debug --image=postgres:15-alpine --restart=Never -n spywatcher -- \
  psql -h <rds-endpoint> -U spywatcher -d spywatcher

Traffic Not Routing

# Check service endpoints
kubectl get endpoints -n spywatcher

# Check ingress
kubectl describe ingress spywatcher-ingress -n spywatcher

# Check ALB target groups
aws elbv2 describe-target-health --target-group-arn <arn>

High Resource Usage

# Check HPA status
kubectl get hpa -n spywatcher

# Scale manually if needed
kubectl scale deployment spywatcher-backend --replicas=10 -n spywatcher

# Check resource limits
kubectl describe deployment spywatcher-backend -n spywatcher

Best Practices

  1. Always test in staging first
  2. Run database migrations before deploying code
  3. Use feature flags for risky changes
  4. Monitor error rates during deployment
  5. Keep rollback scripts ready
  6. Document all configuration changes
  7. Regular backup testing
  8. Security patches applied promptly

Support

For deployment issues:

  • Check GitHub Actions logs
  • Review CloudWatch logs
  • Contact DevOps team
  • Create incident in issue tracker