subculture-collective/discord-spywatcher

Files

Copilot dd0bec5046 Implement production-ready auto-scaling and load balancing infrastructure (#146 )

* Initial plan

* feat: add comprehensive auto-scaling and load balancing configuration

- Add frontend HPA for auto-scaling (2-5 replicas)
- Enhance backend HPA with custom metrics support
- Improve load balancer configuration with health-based routing
- Add advanced traffic management policies
- Create AUTO_SCALING.md documentation
- Add validation script for auto-scaling setup
- Add load testing script for traffic spike simulation
- Update Helm production values with enhanced configs

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* feat: add monitoring, alerting, and comprehensive documentation

- Add Prometheus alerting rules for auto-scaling events
- Add ServiceMonitor for metrics collection
- Create comprehensive AUTO_SCALING_EXAMPLES.md tutorial
- Update DEPLOYMENT.md with auto-scaling references
- Update scripts/README.md with new validation tools
- Add monitoring for HPA, deployments, and load balancers
- Include troubleshooting scenarios and examples

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* docs: add comprehensive implementation summary

- Create AUTO_SCALING_IMPLEMENTATION.md with complete overview
- Document all components, files, and specifications
- Include deployment instructions and validation results
- Add technical specifications and performance characteristics
- Document success criteria achievement

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* fix: address code review feedback

- Merge duplicate alb.ingress.kubernetes.io/load-balancer-attributes annotation
- Fix nginx.ingress.kubernetes.io/limit-burst-multiplier to correct annotation name
- Remove unused checks_warned variable from validation script
- Fix YAML escape sequence in AUTO_SCALING_EXAMPLES.md

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

2025-11-02 18:03:58 -06:00

base

Implement production-ready auto-scaling and load balancing infrastructure (#146 )

2025-11-02 18:03:58 -06:00

overlays

Add production deployment infrastructure with Kubernetes, Terraform, and multi-strategy CI/CD (#145 )

2025-11-02 17:27:49 -06:00

.gitignore

Add production deployment infrastructure with Kubernetes, Terraform, and multi-strategy CI/CD (#145 )

2025-11-02 17:27:49 -06:00

README.md

Add production deployment infrastructure with Kubernetes, Terraform, and multi-strategy CI/CD (#145 )

2025-11-02 17:27:49 -06:00

README.md

Kubernetes Manifests

This directory contains Kubernetes manifests for deploying Spywatcher.

Directory Structure

k8s/
├── base/                    # Base manifests
│   ├── namespace.yaml       # Namespace and resource quotas
│   ├── configmap.yaml       # Application configuration
│   ├── secrets.yaml         # Secrets template (DO NOT commit actual secrets)
│   ├── migration-job.yaml   # Database migration job
│   ├── backend-deployment.yaml
│   ├── backend-service.yaml
│   ├── backend-hpa.yaml     # Horizontal Pod Autoscaler
│   ├── frontend-deployment.yaml
│   ├── frontend-service.yaml
│   ├── postgres-statefulset.yaml
│   ├── redis-statefulset.yaml
│   ├── ingress.yaml
│   ├── pdb.yaml             # Pod Disruption Budget
│   └── kustomization.yaml
├── overlays/                # Environment-specific overlays
│   ├── production/
│   └── staging/
└── secrets/                 # Actual secrets (gitignored)

Quick Start

Prerequisites

kubectl configured with cluster access
kustomize (built into kubectl >= 1.14)

Deploy to Production

# Review what will be deployed
kubectl kustomize k8s/overlays/production

# Apply manifests
kubectl apply -k k8s/overlays/production

# Check deployment status
kubectl get all -n spywatcher

Deploy to Staging

kubectl apply -k k8s/overlays/staging
kubectl get all -n spywatcher-staging

Configuration Management

Secrets

IMPORTANT: Never commit actual secrets to git!

Copy the secrets template:

cp k8s/base/secrets.yaml k8s/secrets/secrets.yaml

Edit with actual values:

vim k8s/secrets/secrets.yaml

Apply separately:

kubectl apply -f k8s/secrets/secrets.yaml

ConfigMap

Application configuration is in k8s/base/configmap.yaml. Environment-specific values can be patched in overlays.

Deployment Strategies

Rolling Update (Default)

# Update image
kubectl set image deployment/spywatcher-backend \
  backend=ghcr.io/subculture-collective/spywatcher-backend:v2.0.0 \
  -n spywatcher

# Watch rollout
kubectl rollout status deployment/spywatcher-backend -n spywatcher

Blue-Green Deployment

Use the provided script:

./scripts/deployment/blue-green-deploy.sh

Canary Deployment

Use the provided script:

./scripts/deployment/canary-deploy.sh

Scaling

Manual Scaling

# Scale backend
kubectl scale deployment spywatcher-backend --replicas=5 -n spywatcher

# Scale frontend
kubectl scale deployment spywatcher-frontend --replicas=3 -n spywatcher

Auto-scaling

HorizontalPodAutoscaler is configured to scale based on:

CPU utilization (target: 70%)
Memory utilization (target: 80%)

# Check HPA status
kubectl get hpa -n spywatcher

# Describe HPA
kubectl describe hpa spywatcher-backend-hpa -n spywatcher

Monitoring

Check Pod Status

# List all pods
kubectl get pods -n spywatcher

# Describe pod
kubectl describe pod <pod-name> -n spywatcher

# View logs
kubectl logs -f <pod-name> -n spywatcher

# View logs from all replicas
kubectl logs -f deployment/spywatcher-backend -n spywatcher

Health Checks

# Test liveness probe
kubectl exec -it deployment/spywatcher-backend -n spywatcher -- \
  wget -qO- http://localhost:3001/health/live

# Test readiness probe
kubectl exec -it deployment/spywatcher-backend -n spywatcher -- \
  wget -qO- http://localhost:3001/health/ready

Resource Usage

# Pod resource usage
kubectl top pods -n spywatcher

# Node resource usage
kubectl top nodes

Troubleshooting

Pod Not Starting

# Check events
kubectl get events -n spywatcher --sort-by='.lastTimestamp'

# Describe pod
kubectl describe pod <pod-name> -n spywatcher

# Check logs
kubectl logs <pod-name> -n spywatcher --previous  # Previous container

Network Issues

# Check services
kubectl get services -n spywatcher

# Check endpoints
kubectl get endpoints -n spywatcher

# Test service from within cluster
kubectl run -it --rm debug --image=busybox --restart=Never -n spywatcher -- \
  wget -qO- http://spywatcher-backend/health/live

Database Connection

# Check database pod
kubectl get pods -n spywatcher | grep postgres

# Test database connection
kubectl exec -it postgres-0 -n spywatcher -- \
  psql -U spywatcher -d spywatcher -c "SELECT version();"

# Check database logs
kubectl logs postgres-0 -n spywatcher

Redis Connection

# Check Redis pod
kubectl get pods -n spywatcher | grep redis

# Test Redis connection
kubectl exec -it redis-0 -n spywatcher -- redis-cli ping

# Check Redis logs
kubectl logs redis-0 -n spywatcher

Maintenance

Update Configuration

# Edit configmap
kubectl edit configmap spywatcher-config -n spywatcher

# Restart pods to pick up changes
kubectl rollout restart deployment/spywatcher-backend -n spywatcher

Database Migrations

Database migrations are run as a separate Kubernetes Job to avoid race conditions. Migrations should be run before deploying new application versions.

# Create a unique migration job
JOB_NAME="db-migration-$(date +%s)"
kubectl create job $JOB_NAME --from=job/spywatcher-db-migration -n spywatcher

# Or apply the migration job directly (it will run once)
kubectl apply -f k8s/base/migration-job.yaml

# Check migration status
kubectl get jobs -n spywatcher

# View migration logs
kubectl logs job/$JOB_NAME -n spywatcher

# Delete completed migration jobs (optional, they auto-delete after 1 hour)
kubectl delete job $JOB_NAME -n spywatcher

Important: The migration job uses completions: 1 and parallelism: 1 to ensure only one migration runs at a time, preventing race conditions and deadlocks.

Backup

# Backup PostgreSQL
kubectl exec postgres-0 -n spywatcher -- \
  pg_dump -U spywatcher spywatcher > backup.sql

# Backup Redis
kubectl exec redis-0 -n spywatcher -- \
  redis-cli BGSAVE

Security

Network Policies

Network policies restrict traffic between pods:

Backend can connect to: PostgreSQL, Redis
Frontend can connect to: Backend
External traffic: Ingress only

RBAC

Service accounts with minimal permissions:

spywatcher-backend: Access to secrets, configmaps
spywatcher-frontend: Read-only access

Secrets

Use Sealed Secrets or External Secrets Operator for production
Never commit unencrypted secrets
Rotate secrets regularly

Ingress

NGINX Ingress Controller

Install if not already present:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install nginx-ingress ingress-nginx/ingress-nginx

Cert-Manager

Install for automatic SSL certificates:

helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true

Create ClusterIssuer:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: your-email@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

Clean Up

Delete Resources

# Delete all resources in namespace
kubectl delete namespace spywatcher

# Or use kustomize
kubectl delete -k k8s/overlays/production

Persistent Data

⚠️ WARNING: Deleting PVCs will delete all data!

# List PVCs
kubectl get pvc -n spywatcher

# Delete specific PVC
kubectl delete pvc postgres-data-postgres-0 -n spywatcher

Best Practices

Use namespaces: Separate environments with namespaces
Resource limits: Always set requests and limits
Health checks: Configure liveness and readiness probes
Security context: Run containers as non-root
Pod disruption budgets: Ensure high availability
Horizontal scaling: Use HPA for dynamic scaling
Rolling updates: Use for zero-downtime deployments
Monitoring: Integrate with Prometheus/Grafana
Logging: Centralize logs with ELK or Loki
Backups: Regular backups of persistent data