* Initial plan * feat(E5): Add visual regression test infrastructure - Extended Playwright config with screenshot comparison settings - Added visual test scripts to package.json - Created deterministic fixture generator with fixed positions - Implemented comprehensive visual.spec.ts test suite - Updated .gitignore to preserve baseline screenshots Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com> * feat(E5): Add baseline screenshots and fix localStorage init - Generated 11 baseline screenshots for visual regression tests - Fixed theme setting to use addInitScript before page.goto() - All visual tests now pass consistently - Tests cover: empty/small/large graphs, 3D/2D/dashboard views, light/dark themes, zoomed states Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com> * feat(E5): Add CI integration and comprehensive documentation for visual tests - Added visual-regression job to CI workflow - Job runs after frontend tests and uploads diff reports on failure - Updated TESTING.md with complete visual regression testing guide - Documented test coverage, configuration, troubleshooting, and best practices - All 11 visual tests pass consistently Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com> * docs(E5): Add e2e README for test structure documentation - Created comprehensive README for e2e directory - Documents test types, directory structure, and usage - Explains when and how to update baselines - Includes CI integration details - All 14 e2e tests pass (3 smoke + 11 visual) Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>
Reddit Cluster Map
Collect, analyze, and visualize relationships between Reddit communities and users as an interactive 3D network graph.
Changelog • Contributing • Releases
🧠 What it does
- Crawls subreddits for posts and comments (OAuth-authenticated; globally rate limited).
- Stores normalized data in PostgreSQL.
- Precomputes a graph (nodes + links) based on shared participation and activity, with an optional detailed content graph (posts/comments).
- Serves the graph at
/api/graphfor the React frontend to render in multiple visualization modes:- 3D Graph: Interactive WebGL visualization
- 2D Graph: SVG-based force-directed layout with drag & pan
- Dashboard: Statistical overview and analytics
- Communities: Automated community detection using the Louvain algorithm
🧱 Architecture
- Backend (Go)
- API server:
backend/cmd/server - Crawler:
backend/cmd/crawler - Precalculation:
backend/cmd/precalculate - Data access via sqlc: SQL in
backend/internal/queries/*.sql→ generated inbackend/internal/db
- API server:
- Database: PostgreSQL
- Frontend (Vite + React 3D):
frontend/(graph viewer) - Monitoring: Prometheus + Grafana for metrics and dashboards
See docs/overview.md for the full system picture and data flow.
🚀 Quick start
For full setup (Docker, env vars, seeding a crawl), see docs/setup.md.
For CI/CD pipeline and Docker image publishing, see docs/CI-CD.md.
Common dev tasks from backend/:
- Setup environment file:
make setup(creates.envfrom.env.example)
- Regenerate sqlc after editing SQL in
backend/internal/queries/*.sql:make sqlc(alias:make generate)
- Run the one-shot graph precalc:
make precalculate
- Run tests:
go test ./...
For New Developers
-
Clone and setup:
git clone https://github.com/subculture-collective/reddit-cluster-map.git cd reddit-cluster-map/backend make setup # Creates .env and checks tools -
Configure
backend/.envwith your Reddit OAuth credentials and database password -
Start services:
docker compose up -d --build make migrate-up-local -
(Optional) Seed sample data and run smoke tests:
make seed make smoke-test
See the Developer Guide for detailed workflows, testing, and best practices.
Documentation
Getting Started
- Setup Guide - Complete setup with Docker Compose, migrations, environment variables
- Developer Guide - Development workflows, Makefile targets, testing, and troubleshooting
- Contributing Guide - How to contribute, coding standards, PR guidelines
Architecture & Design
- Architecture Overview - System architecture with diagrams, data flow, component interactions
- Overview - High-level system design and data flow
Operations
- Runbooks - Operational procedures: backup/restore, maintenance, troubleshooting
- Monitoring Guide - Metrics, Prometheus, Grafana dashboards, and alerts
- Data Integrity Guide - Database integrity checks and maintenance
Features & APIs
- API Documentation - Core API endpoints and usage
- Community API - Community aggregation endpoints (supernodes and subgraphs)
- Community Detection - Louvain algorithm implementation
- Visualization Modes - 3D, 2D, dashboard, and community views
Advanced Topics
- Performance Documentation - Query optimization, benchmarking, and scaling
- Performance Profiling Guide - Runtime profiling, benchmarks, and optimization
- Performance Analysis - Performance review and optimization recommendations
- Load Testing Guide - k6-based load testing for API performance validation
- OAuth Token Management - Token refresh, credential rotation
- Crawler Resilience - Rate limiting, retries, circuit breakers
- Security Guide - Security features and best practices
- Security Audit Summary - Quick reference for security testing and auditing
- Security Audit Guide - Comprehensive security auditing and penetration testing procedures
- Penetration Testing Checklist - Detailed penetration testing checklist
- CI/CD Pipeline - Continuous integration and deployment
Common Development Tasks
From backend/, run make help to see all available targets. Key ones:
make generate- Regenerate sqlc code after editing SQLmake precalculate- Run graph precalculationmake test- Run all testsmake benchmark- Run Go benchmark testsmake benchmark-graph- Benchmark graph query performancemake performance-baseline- Collect comprehensive performance baselinemake profile-cpu- Collect CPU profile (requires ENABLE_PROFILING=true)make profile-memory- Collect memory profile (requires ENABLE_PROFILING=true)make loadtest- Run k6 load tests (smoke, load, stress, soak)make loadtest-smoke- Quick smoke test (30s)make loadtest-load- Load test with 50 VUs (5min)make profile-all- Collect all profiles (CPU, memory, goroutines)make integrity-check- Run data integrity checksmake integrity-clean- Clean up data integrity issuesmake lint- Check code formatting and run go vetmake fmt- Auto-format Go codemake smoke-test- Run API health checksmake seed- Populate database with sample data
🔌 API surface
GET /api/graph?max_nodes=20000&max_links=50000- Returns
{ nodes, links }. Results are cached for ~60s and capped by max_nodes/max_links using a stable weighting. - Prefers precalculated tables, falls back to legacy JSON when empty.
- Returns
GET /api/communities?max_nodes=100&max_links=500&with_positions=true- Returns aggregated community supernodes and inter-community weighted links.
- Communities detected via server-side Louvain algorithm during precalculation.
GET /api/communities/{id}?max_nodes=10000&max_links=50000- Returns the full subgraph (all nodes and links) for a specific community.
POST /api/crawl { "subreddit": "AskReddit" }- Additional resource endpoints exist without
/apiprefix:/subreddits,/users,/posts,/comments,/jobs.
See docs/api.md and docs/api-communities.md for details.
📊 Monitoring and Analytics
The project includes comprehensive monitoring with Prometheus and Grafana:
- Metrics endpoint:
GET /metrics- Prometheus format metrics - Prometheus: http://localhost:9090 - Metrics collection and querying
- Grafana: http://localhost:3000 - Dashboards and visualizations (default: admin/admin)
Key Metrics
- Crawl metrics: Job throughput, success/failure rates, posts/comments processed
- API metrics: Request rates, response times (p50/p95/p99), error rates
- Graph metrics: Node/link counts by type, precalculation duration
- Database metrics: Operation durations, error rates
- System health: Circuit breaker status, rate limiting pressure
Alerts
Pre-configured alerts for:
- High API error rates (>5%)
- High crawler error rates (>10%)
- Slow queries (p95 > 2s)
- Database errors
- Circuit breaker trips
- Stalled crawl jobs
See Monitoring Guide for complete metrics reference, dashboard setup, and PromQL examples.
⚙️ Configuration
Key environment variables (selected):
- Security (see
docs/SECURITY.mdfor details)ENABLE_RATE_LIMIT(true) — enable/disable rate limitingRATE_LIMIT_GLOBAL(100) — global requests per secondRATE_LIMIT_GLOBAL_BURST(200) — global burst sizeRATE_LIMIT_PER_IP(10) — requests per second per IPRATE_LIMIT_PER_IP_BURST(20) — per-IP burst sizeCORS_ALLOWED_ORIGINS— comma-separated list of allowed CORS origins (default: localhost:5173,localhost:3000)ADMIN_API_TOKEN— bearer token for admin endpoints
- Monitoring
GRAFANA_ADMIN_PASSWORD(admin) — Grafana admin password
- Reddit OAuth
REDDIT_CLIENT_ID,REDDIT_CLIENT_SECRET,REDDIT_REDIRECT_URI,REDDIT_SCOPES,REDDIT_USER_AGENT
- HTTP / retries
HTTP_MAX_RETRIES(default 3),HTTP_RETRY_BASE_MS(300),HTTP_TIMEOUT_MS(15000),LOG_HTTP_RETRIES(false)GRAPH_QUERY_TIMEOUT_MS(30000) — timeout for graph API queriesDB_STATEMENT_TIMEOUT_MS(25000) — database statement timeout
- Graph generation
DETAILED_GRAPH(false) — include posts/commentsPOSTS_PER_SUB_IN_GRAPH(10),COMMENTS_PER_POST_IN_GRAPH(50)MAX_AUTHOR_CONTENT_LINKS(3) — cross-link content by the same author across subredditsDISABLE_API_GRAPH_JOB(false) — disable hourly background job in APIPRECALC_CLEAR_ON_START(false) — when true, clears graph tables at precalc start- Batching/progress (applied at runtime in precalc):
GRAPH_NODE_BATCH_SIZE(1000)GRAPH_LINK_BATCH_SIZE(2000)GRAPH_PROGRESS_INTERVAL(10000)
- Crawler scheduling
STALE_DAYS(30),RESET_CRAWLING_AFTER_MIN(15)
🖥 Frontend
- Vite + React with multiple visualization modes:
- 3D graph with
react-force-graph-3dand interactive minimap for navigation - 2D graph with D3.js force simulation
- Statistics dashboard
- Community detection view with Louvain algorithm
- 3D graph with
VITE_API_URLdefaults to/api- Optional client caps:
VITE_MAX_RENDER_NODES,VITE_MAX_RENDER_LINKS
See docs/visualization-modes.md and docs/community-detection.md for feature details.
See frontend/README.md for local dev and env hints.