Implement centralized log aggregation with Grafana Loki stack #139

Merged
Copilot merged 8 commits from copilot/implement-centralized-logging into main 2025-10-31 23:15:38 -05:00
Copilot commented 2025-10-31 18:35:31 -05:00 (Migrated from github.com)

Implements centralized log aggregation and analysis using Grafana Loki stack. Aggregates logs from all services (backend, security, postgres, redis, pgbouncer, nginx) with 30-day retention and efficient search via LogQL.

Architecture

Services (Winston JSON logs) → Promtail (collection) → Loki (storage) → Grafana (UI)
  • Loki 2.9.3: TSDB storage, 30-day retention, 10:1 compression
  • Promtail 2.9.3: Docker socket integration, JSON parsing pipeline
  • Grafana 10.2.3: Pre-configured dashboard with log volume, errors, security events

Configuration

Loki (loki/loki-config.yml):

  • 30-day retention with automatic compaction
  • 15 MB/s ingestion limit, 5000 entries/query max
  • Auth disabled by default (enable for production)

Promtail (promtail/promtail-config.yml):

  • Separate jobs for backend logs and security logs (prevents duplication)
  • Label extraction: job, level, service, action, result
  • Docker container log collection via socket

Grafana (grafana/provisioning/):

  • Loki datasource (default)
  • Dashboard with filters: job, level, search
  • Panels: log volume chart, error stats, security events

Docker Integration

Updated docker-compose.{dev,prod}.yml:

  • Added Loki, Promtail, Grafana services
  • Shared log volume: logs-backend:/app/logs
  • Resource limits (prod): Loki 512MB, Grafana 512MB, Promtail 128MB
  • Grafana accessible at :3000 (admin/admin)

Log Sources

  1. Backend: combined.log, error.log, exceptions.log
  2. Security: security.log (auth events, rate limits, threats)
  3. PostgreSQL, Redis, PgBouncer, Nginx (Docker logs)

Query Examples

# All errors
{job="backend", level="error"}

# Failed login attempts
{job="security"} | json | action="LOGIN_ATTEMPT" | result="FAILURE"

# Request tracing
{job="backend"} | json | requestId="abc123"

Documentation

  • LOGGING.md: Complete usage guide, troubleshooting, performance tuning
  • docs/CENTRALIZED_LOGGING_QUICKSTART.md: 5-minute setup guide
  • Validation script: scripts/validate-logging-setup.sh (22 checks)

Notes

  • Winston structured logging already in place; no code changes required
  • Resource usage: ~700MB dev, ~1.2GB prod
  • Loki chosen over ELK for 70% lower resource footprint
  • Sensitive data (tokens, passwords) auto-sanitized by Winston
Original prompt

This section details on the original issue you should resolve

<issue_title>Log Aggregation & Analysis - Centralized Logging</issue_title>
<issue_description>## 🎯 Objective Implement centralized log aggregation and analysis system. ## 📋 Requirements - ELK or Loki stack setup - Structured logging format - Log shipping from all services - Search and filtering UI - Log retention policies ## 🎯 Success Criteria - All logs centralized - Search working efficiently - Retention policies configured - Performance acceptable ## ⏱️ Estimated Effort 3-4 days</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Implements centralized log aggregation and analysis using Grafana Loki stack. Aggregates logs from all services (backend, security, postgres, redis, pgbouncer, nginx) with 30-day retention and efficient search via LogQL. ## Architecture ``` Services (Winston JSON logs) → Promtail (collection) → Loki (storage) → Grafana (UI) ``` - **Loki 2.9.3**: TSDB storage, 30-day retention, 10:1 compression - **Promtail 2.9.3**: Docker socket integration, JSON parsing pipeline - **Grafana 10.2.3**: Pre-configured dashboard with log volume, errors, security events ## Configuration **Loki** (`loki/loki-config.yml`): - 30-day retention with automatic compaction - 15 MB/s ingestion limit, 5000 entries/query max - Auth disabled by default (enable for production) **Promtail** (`promtail/promtail-config.yml`): - Separate jobs for backend logs and security logs (prevents duplication) - Label extraction: `job`, `level`, `service`, `action`, `result` - Docker container log collection via socket **Grafana** (`grafana/provisioning/`): - Loki datasource (default) - Dashboard with filters: job, level, search - Panels: log volume chart, error stats, security events ## Docker Integration Updated `docker-compose.{dev,prod}.yml`: - Added Loki, Promtail, Grafana services - Shared log volume: `logs-backend:/app/logs` - Resource limits (prod): Loki 512MB, Grafana 512MB, Promtail 128MB - Grafana accessible at `:3000` (admin/admin) ## Log Sources 1. Backend: `combined.log`, `error.log`, `exceptions.log` 2. Security: `security.log` (auth events, rate limits, threats) 3. PostgreSQL, Redis, PgBouncer, Nginx (Docker logs) ## Query Examples ```logql # All errors {job="backend", level="error"} # Failed login attempts {job="security"} | json | action="LOGIN_ATTEMPT" | result="FAILURE" # Request tracing {job="backend"} | json | requestId="abc123" ``` ## Documentation - `LOGGING.md`: Complete usage guide, troubleshooting, performance tuning - `docs/CENTRALIZED_LOGGING_QUICKSTART.md`: 5-minute setup guide - Validation script: `scripts/validate-logging-setup.sh` (22 checks) ## Notes - Winston structured logging already in place; no code changes required - Resource usage: ~700MB dev, ~1.2GB prod - Loki chosen over ELK for 70% lower resource footprint - Sensitive data (tokens, passwords) auto-sanitized by Winston <!-- START COPILOT CODING AGENT SUFFIX --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Log Aggregation & Analysis - Centralized Logging</issue_title> > <issue_description>## 🎯 Objective Implement centralized log aggregation and analysis system. ## 📋 Requirements - ELK or Loki stack setup - Structured logging format - Log shipping from all services - Search and filtering UI - Log retention policies ## 🎯 Success Criteria - All logs centralized - Search working efficiently - Retention policies configured - Performance acceptable ## ⏱️ Estimated Effort 3-4 days</issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> - Fixes subculture-collective/discord-spywatcher#41 <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.
onnwee (Migrated from github.com) reviewed 2025-10-31 18:35:31 -05:00
PatrickFanella (Migrated from github.com) reviewed 2025-10-31 18:35:31 -05:00
copilot-pull-request-reviewer[bot] (Migrated from github.com) reviewed 2025-10-31 22:46:11 -05:00
copilot-pull-request-reviewer[bot] (Migrated from github.com) left a comment

Pull Request Overview

This PR implements centralized log aggregation for Discord SpyWatcher using the Grafana Loki stack. The implementation provides comprehensive log collection, storage, and visualization across all services with a 30-day retention policy.

Key changes:

  • Added Grafana Loki stack (Loki, Promtail, Grafana) for log aggregation and visualization
  • Configured Promtail to collect logs from backend, security, PostgreSQL, Redis, PgBouncer, Nginx, and Docker containers
  • Created pre-configured Grafana dashboard with log volume charts, security logs viewer, and error logs viewer
  • Added comprehensive documentation including setup guides, troubleshooting, and LogQL query examples

Reviewed Changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/validate-logging-setup.sh Validation script with 22 automated checks for logging infrastructure setup
promtail/promtail-config.yml Promtail configuration defining log sources and processing pipelines for all services
promtail/README.md Documentation for Promtail configuration and log source customization
loki/loki-config.yml Loki server configuration with 30-day retention policy and ingestion limits
loki/README.md Documentation for Loki configuration and retention policy management
grafana/provisioning/datasources/loki.yml Grafana datasource provisioning for Loki and Prometheus
grafana/provisioning/dashboards/json/spywatcher-logs.json Pre-configured dashboard with log visualization panels and template variables
grafana/provisioning/dashboards/dashboard.yml Dashboard provider configuration for auto-loading dashboards
grafana/README.md Documentation for Grafana setup, access credentials, and customization
docs/LOGGING_IMPLEMENTATION_SUMMARY.md Comprehensive implementation summary with architecture, performance characteristics, and comparison with alternatives
docs/CENTRALIZED_LOGGING_QUICKSTART.md Quick start guide with step-by-step instructions for using the logging stack
docker-compose.prod.yml Added Loki, Promtail, and Grafana services with resource limits and volume mounts
docker-compose.dev.yml Added Loki, Promtail, and Grafana services for development environment
README.md Updated monitoring section to reference centralized logging capabilities
LOGGING.md Comprehensive logging documentation covering architecture, usage, queries, and troubleshooting
.gitignore Added Loki and Grafana data directories to ignore list
.env.example Added Grafana configuration environment variables

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

## Pull Request Overview This PR implements centralized log aggregation for Discord SpyWatcher using the Grafana Loki stack. The implementation provides comprehensive log collection, storage, and visualization across all services with a 30-day retention policy. Key changes: - Added Grafana Loki stack (Loki, Promtail, Grafana) for log aggregation and visualization - Configured Promtail to collect logs from backend, security, PostgreSQL, Redis, PgBouncer, Nginx, and Docker containers - Created pre-configured Grafana dashboard with log volume charts, security logs viewer, and error logs viewer - Added comprehensive documentation including setup guides, troubleshooting, and LogQL query examples ### Reviewed Changes Copilot reviewed 16 out of 17 changed files in this pull request and generated 4 comments. <details> <summary>Show a summary per file</summary> | File | Description | | ---- | ----------- | | scripts/validate-logging-setup.sh | Validation script with 22 automated checks for logging infrastructure setup | | promtail/promtail-config.yml | Promtail configuration defining log sources and processing pipelines for all services | | promtail/README.md | Documentation for Promtail configuration and log source customization | | loki/loki-config.yml | Loki server configuration with 30-day retention policy and ingestion limits | | loki/README.md | Documentation for Loki configuration and retention policy management | | grafana/provisioning/datasources/loki.yml | Grafana datasource provisioning for Loki and Prometheus | | grafana/provisioning/dashboards/json/spywatcher-logs.json | Pre-configured dashboard with log visualization panels and template variables | | grafana/provisioning/dashboards/dashboard.yml | Dashboard provider configuration for auto-loading dashboards | | grafana/README.md | Documentation for Grafana setup, access credentials, and customization | | docs/LOGGING_IMPLEMENTATION_SUMMARY.md | Comprehensive implementation summary with architecture, performance characteristics, and comparison with alternatives | | docs/CENTRALIZED_LOGGING_QUICKSTART.md | Quick start guide with step-by-step instructions for using the logging stack | | docker-compose.prod.yml | Added Loki, Promtail, and Grafana services with resource limits and volume mounts | | docker-compose.dev.yml | Added Loki, Promtail, and Grafana services for development environment | | README.md | Updated monitoring section to reference centralized logging capabilities | | LOGGING.md | Comprehensive logging documentation covering architecture, usage, queries, and troubleshooting | | .gitignore | Added Loki and Grafana data directories to ignore list | | .env.example | Added Grafana configuration environment variables | </details> --- 💡 <a href="/subculture-collective/discord-spywatcher/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
@@ -0,0 +1,523 @@
# Centralized Log Aggregation & Analysis
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2025-10-31 22:46:11 -05:00

The path pattern documented here is '/logs/backend/security*.log' with a wildcard, but the actual configuration in promtail-config.yml line 47 uses '/logs/backend/security.log' without a wildcard. This should be corrected to match the implementation.

- Security logs (`/logs/backend/security.log`)
The path pattern documented here is '/logs/backend/security*.log' with a wildcard, but the actual configuration in promtail-config.yml line 47 uses '/logs/backend/security.log' without a wildcard. This should be corrected to match the implementation. ```suggestion - Security logs (`/logs/backend/security.log`) ```
@@ -0,0 +1,347 @@
# Centralized Log Aggregation Implementation Summary
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2025-10-31 22:46:10 -05:00

The path pattern documented here is '/logs/backend/security*.log' with a wildcard, but the actual configuration in promtail-config.yml line 47 uses '/logs/backend/security.log' without a wildcard. Ensure documentation matches the actual implementation.

- **Security** - Auth events, security incidents (`/logs/backend/security.log`)
The path pattern documented here is '/logs/backend/security*.log' with a wildcard, but the actual configuration in promtail-config.yml line 47 uses '/logs/backend/security.log' without a wildcard. Ensure documentation matches the actual implementation. ```suggestion - **Security** - Auth events, security incidents (`/logs/backend/security.log`) ```
@@ -0,0 +1,60 @@
# Promtail Configuration
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2025-10-31 22:46:10 -05:00

The path pattern shown here is '/logs/backend/security*.log' with a wildcard, but the actual configuration in promtail-config.yml line 47 uses '/logs/backend/security.log' without a wildcard. This documentation should match the implementation.

2. **Security Logs** (`/logs/backend/security.log`)
The path pattern shown here is '/logs/backend/security*.log' with a wildcard, but the actual configuration in promtail-config.yml line 47 uses '/logs/backend/security.log' without a wildcard. This documentation should match the implementation. ```suggestion 2. **Security Logs** (`/logs/backend/security.log`) ```
@@ -0,0 +44,4 @@
labels:
job: security
service: spywatcher-security
__path__: /logs/backend/security.log
copilot-pull-request-reviewer[bot] (Migrated from github.com) commented 2025-10-31 22:46:10 -05:00

The path pattern '/logs/backend/security.log' matches only a single file, while the README.md documentation states the pattern is '/logs/backend/security*.log' (with wildcard). This inconsistency could cause confusion. Consider updating the path to match the documentation if the intention is to support multiple security log files, or update the README to reflect the exact path.

          __path__: /logs/backend/security*.log
The path pattern '/logs/backend/security.log' matches only a single file, while the README.md documentation states the pattern is '/logs/backend/security*.log' (with wildcard). This inconsistency could cause confusion. Consider updating the path to match the documentation if the intention is to support multiple security log files, or update the README to reflect the exact path. ```suggestion __path__: /logs/backend/security*.log ```
Sign in to join this conversation.