subculture-collective/discord-spywatcher

Files

Copilot dd0bec5046 Implement production-ready auto-scaling and load balancing infrastructure (#146 )

* Initial plan

* feat: add comprehensive auto-scaling and load balancing configuration

- Add frontend HPA for auto-scaling (2-5 replicas)
- Enhance backend HPA with custom metrics support
- Improve load balancer configuration with health-based routing
- Add advanced traffic management policies
- Create AUTO_SCALING.md documentation
- Add validation script for auto-scaling setup
- Add load testing script for traffic spike simulation
- Update Helm production values with enhanced configs

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* feat: add monitoring, alerting, and comprehensive documentation

- Add Prometheus alerting rules for auto-scaling events
- Add ServiceMonitor for metrics collection
- Create comprehensive AUTO_SCALING_EXAMPLES.md tutorial
- Update DEPLOYMENT.md with auto-scaling references
- Update scripts/README.md with new validation tools
- Add monitoring for HPA, deployments, and load balancers
- Include troubleshooting scenarios and examples

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* docs: add comprehensive implementation summary

- Create AUTO_SCALING_IMPLEMENTATION.md with complete overview
- Document all components, files, and specifications
- Include deployment instructions and validation results
- Add technical specifications and performance characteristics
- Document success criteria achievement

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* fix: address code review feedback

- Merge duplicate alb.ingress.kubernetes.io/load-balancer-attributes annotation
- Fix nginx.ingress.kubernetes.io/limit-burst-multiplier to correct annotation name
- Remove unused checks_warned variable from validation script
- Fix YAML escape sequence in AUTO_SCALING_EXAMPLES.md

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

2025-11-02 18:03:58 -06:00

9.4 KiB

Raw Permalink Blame History

Deployment Guide

This document describes the production deployment strategy for Spywatcher, including infrastructure setup, deployment procedures, and rollback strategies.

Overview
Infrastructure Setup
Deployment Strategies
Kubernetes Deployment
Terraform Infrastructure
Helm Charts
CI/CD Pipeline
Rollback Procedures
Monitoring and Alerts
Troubleshooting

AUTO_SCALING.md - Comprehensive auto-scaling and load balancing guide
docs/AUTO_SCALING_EXAMPLES.md - Practical examples and tutorials
INFRASTRUCTURE.md - Infrastructure architecture overview
MONITORING.md - Monitoring and observability setup

Overview

Spywatcher uses a multi-strategy deployment approach with:

Infrastructure as Code: Terraform for AWS infrastructure
Container Orchestration: Kubernetes (EKS) for application deployment
Package Management: Helm charts for simplified deployments
Deployment Strategies: Rolling, Blue-Green, and Canary deployments
CI/CD: GitHub Actions for automated deployments

Infrastructure Setup

Prerequisites

AWS Account with appropriate permissions
AWS CLI configured
kubectl installed
Terraform installed (>= 1.5.0)
Helm installed (>= 3.0)

Terraform Infrastructure

The infrastructure is defined in Terraform modules:

cd terraform

# Initialize Terraform
terraform init

# Review the plan
terraform plan -var-file="environments/production/terraform.tfvars"

# Apply infrastructure
terraform apply -var-file="environments/production/terraform.tfvars"

Infrastructure Components

VPC: Isolated network with public, private, and database subnets across 3 AZs
EKS Cluster: Kubernetes cluster with managed node groups
RDS PostgreSQL: Managed database with encryption and automated backups
ElastiCache Redis: In-memory cache with cluster mode
Application Load Balancer: With WAF for security
Security Groups: Least-privilege network access
IAM Roles: Service accounts and node permissions

Configure kubectl

After infrastructure deployment:

aws eks update-kubeconfig --name spywatcher-production --region us-east-1
kubectl cluster-info

Deployment Strategies

Rolling Deployment (Default)

Updates pods gradually, maintaining service availability.

# Triggered automatically on push to main branch
# Or manually via GitHub Actions UI

Advantages:

Simple and predictable
Zero downtime
Automatic rollback on failure

Disadvantages:

Gradual rollout may take time
Both versions run simultaneously during update

Blue-Green Deployment

Maintains two identical environments, switching traffic instantly.

# Via GitHub Actions
# Select "blue-green" as deployment strategy

# Or manually
IMAGE_TAG=latest ./scripts/deployment/blue-green-deploy.sh

# Rollback if needed
./scripts/deployment/blue-green-deploy.sh --rollback

Advantages:

Instant traffic switch
Easy rollback
Full environment testing before switch

Disadvantages:

Requires double resources temporarily
Database migrations must be compatible with both versions

Canary Deployment

Gradually shifts traffic to new version while monitoring metrics.

# Via GitHub Actions
# Select "canary" as deployment strategy

# Or manually
IMAGE_TAG=latest CANARY_STEPS="5 25 50 100" ./scripts/deployment/canary-deploy.sh

Advantages:

Risk mitigation through gradual rollout
Real-world testing with subset of users
Automated rollback on errors

Disadvantages:

Longer deployment time
Requires robust monitoring

Kubernetes Deployment

Using Kustomize

Deploy to different environments:

# Production
kubectl apply -k k8s/overlays/production

# Staging
kubectl apply -k k8s/overlays/staging

# Development (base)
kubectl apply -k k8s/base

Manual Deployment

# Create namespace
kubectl apply -f k8s/base/namespace.yaml

# Apply configurations
kubectl apply -f k8s/base/configmap.yaml
kubectl apply -f k8s/base/secrets.yaml

# Deploy databases
kubectl apply -f k8s/base/postgres-statefulset.yaml
kubectl apply -f k8s/base/redis-statefulset.yaml

# Deploy applications
kubectl apply -f k8s/base/backend-deployment.yaml
kubectl apply -f k8s/base/frontend-deployment.yaml

# Create services
kubectl apply -f k8s/base/backend-service.yaml
kubectl apply -f k8s/base/frontend-service.yaml

# Configure ingress
kubectl apply -f k8s/base/ingress.yaml

Scaling

# Manual scaling
kubectl scale deployment spywatcher-backend --replicas=5 -n spywatcher

# Auto-scaling is configured via HPA
kubectl get hpa -n spywatcher

Helm Charts

Installation

# Install with default values
helm install spywatcher ./helm/spywatcher -n spywatcher --create-namespace

# Install with custom values
helm install spywatcher ./helm/spywatcher \
  -n spywatcher \
  --create-namespace \
  -f helm/spywatcher/values-production.yaml

Upgrade

helm upgrade spywatcher ./helm/spywatcher -n spywatcher

Rollback

# List releases
helm history spywatcher -n spywatcher

# Rollback to previous version
helm rollback spywatcher -n spywatcher

# Rollback to specific revision
helm rollback spywatcher 2 -n spywatcher

CI/CD Pipeline

GitHub Actions Workflow

The deployment pipeline is triggered by:

Push to main branch (automatic)
Manual workflow dispatch

Pipeline Steps

Build and Push
- Build Docker images for backend and frontend
- Push to GitHub Container Registry
- Tag with commit SHA and latest
Database Migration
- Run Prisma migrations
- Verify migration success
Deploy
- Apply selected deployment strategy
- Update Kubernetes deployments
- Monitor rollout status
Smoke Tests
- Health check endpoints
- Basic functionality tests
Rollback on Failure
- Automatic rollback if deployment fails
- Notification to team

Required Secrets

Configure in GitHub repository settings:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
DATABASE_URL
REDIS_URL
JWT_SECRET
JWT_REFRESH_SECRET
DISCORD_BOT_TOKEN
DISCORD_CLIENT_ID
DISCORD_CLIENT_SECRET
SLACK_WEBHOOK (optional)

Rollback Procedures

Kubernetes Rollback

# View rollout history
kubectl rollout history deployment/spywatcher-backend -n spywatcher

# Rollback to previous version
kubectl rollout undo deployment/spywatcher-backend -n spywatcher

# Rollback to specific revision
kubectl rollout undo deployment/spywatcher-backend --to-revision=2 -n spywatcher

# Check rollback status
kubectl rollout status deployment/spywatcher-backend -n spywatcher

Blue-Green Rollback

./scripts/deployment/blue-green-deploy.sh --rollback

Database Rollback

# If migration needs to be rolled back
kubectl exec -it deployment/spywatcher-backend -n spywatcher -- npx prisma migrate resolve --rolled-back <migration_name>

Monitoring and Alerts

Health Checks

# Liveness probe
curl https://api.spywatcher.example.com/health/live

# Readiness probe
curl https://api.spywatcher.example.com/health/ready

Kubernetes Monitoring

# Check pod status
kubectl get pods -n spywatcher

# View pod logs
kubectl logs -f deployment/spywatcher-backend -n spywatcher

# Check events
kubectl get events -n spywatcher --sort-by='.lastTimestamp'

# Resource usage
kubectl top pods -n spywatcher
kubectl top nodes

CloudWatch Metrics

Monitor via AWS CloudWatch:

EKS cluster metrics
RDS performance metrics
ElastiCache metrics
ALB request metrics

Troubleshooting

Pod Not Starting

# Describe pod to see events
kubectl describe pod <pod-name> -n spywatcher

# Check logs
kubectl logs <pod-name> -n spywatcher

# Check resource constraints
kubectl describe node <node-name>

Database Connection Issues

# Verify database secret
kubectl get secret spywatcher-secrets -n spywatcher -o yaml

# Test database connection
kubectl run -it --rm debug --image=postgres:15-alpine --restart=Never -n spywatcher -- \
  psql -h <rds-endpoint> -U spywatcher -d spywatcher

Traffic Not Routing

# Check service endpoints
kubectl get endpoints -n spywatcher

# Check ingress
kubectl describe ingress spywatcher-ingress -n spywatcher

# Check ALB target groups
aws elbv2 describe-target-health --target-group-arn <arn>

High Resource Usage

# Check HPA status
kubectl get hpa -n spywatcher

# Scale manually if needed
kubectl scale deployment spywatcher-backend --replicas=10 -n spywatcher

# Check resource limits
kubectl describe deployment spywatcher-backend -n spywatcher

Best Practices

Always test in staging first
Run database migrations before deploying code
Use feature flags for risky changes
Monitor error rates during deployment
Keep rollback scripts ready
Document all configuration changes
Regular backup testing
Security patches applied promptly

Support

For deployment issues:

Check GitHub Actions logs
Review CloudWatch logs
Contact DevOps team
Create incident in issue tracker

9.4 KiB Raw Permalink Blame History

Deployment Guide

Table of Contents

Related Documentation

Overview

Infrastructure Setup

Prerequisites

Terraform Infrastructure

Infrastructure Components

Configure kubectl

Deployment Strategies

Rolling Deployment (Default)

Blue-Green Deployment

Canary Deployment

Kubernetes Deployment

Using Kustomize

Manual Deployment

Scaling

Helm Charts

Installation

Upgrade

Rollback

CI/CD Pipeline

GitHub Actions Workflow

Pipeline Steps

Required Secrets

Rollback Procedures

Kubernetes Rollback

Blue-Green Rollback

Database Rollback

Monitoring and Alerts

Health Checks

Kubernetes Monitoring

CloudWatch Metrics

Troubleshooting

Pod Not Starting

Database Connection Issues

Traffic Not Routing

High Resource Usage

Best Practices

Support

9.4 KiB

Raw Permalink Blame History