Go to file

Copilot 8550686a57 Implement visual regression testing for graph rendering (#246 )

* Initial plan

* feat(E5): Add visual regression test infrastructure

- Extended Playwright config with screenshot comparison settings
- Added visual test scripts to package.json
- Created deterministic fixture generator with fixed positions
- Implemented comprehensive visual.spec.ts test suite
- Updated .gitignore to preserve baseline screenshots

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

* feat(E5): Add baseline screenshots and fix localStorage init

- Generated 11 baseline screenshots for visual regression tests
- Fixed theme setting to use addInitScript before page.goto()
- All visual tests now pass consistently
- Tests cover: empty/small/large graphs, 3D/2D/dashboard views, light/dark themes, zoomed states

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

* feat(E5): Add CI integration and comprehensive documentation for visual tests

- Added visual-regression job to CI workflow
- Job runs after frontend tests and uploads diff reports on failure
- Updated TESTING.md with complete visual regression testing guide
- Documented test coverage, configuration, troubleshooting, and best practices
- All 11 visual tests pass consistently

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

* docs(E5): Add e2e README for test structure documentation

- Created comprehensive README for e2e directory
- Documents test types, directory structure, and usage
- Explains when and how to update baselines
- Includes CI integration details
- All 14 e2e tests pass (3 smoke + 11 visual)

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

2026-02-12 02:47:33 -06:00

.github

Implement visual regression testing for graph rendering (#246 )

2026-02-12 02:47:33 -06:00

backend

Add k6 load testing infrastructure with 4 test scenarios and endpoint-specific SLOs (#244 )

2026-02-11 17:28:48 -06:00

docs

Add k6 load testing infrastructure with 4 test scenarios and endpoint-specific SLOs (#244 )

2026-02-11 17:28:48 -06:00

frontend

Implement visual regression testing for graph rendering (#246 )

2026-02-12 02:47:33 -06:00

monitoring

Update monitoring/prometheus/alerts/reddit-cluster-map.yml

2025-10-25 21:42:29 -05:00

scripts

Implement cursor-based pagination for graph nodes API (#232 )

2026-02-09 18:18:21 -06:00

.gitignore

Implement visual regression testing for graph rendering (#246 )

2026-02-12 02:47:33 -06:00

BENCHMARK_IMPLEMENTATION.md

Add automated performance benchmarks with regression detection (#243 )

2026-02-11 15:22:50 -06:00

CAMERA_DISTANCE_SCALING_IMPLEMENTATION.md

Implement camera-distance-based node size scaling (#215 )

2026-02-08 21:58:25 -06:00

CHANGELOG.md

Implement incremental graph precalculation (#222 )

2026-02-09 10:02:28 -06:00

CONTRIBUTING.md

Add release process: versioning, tags, and changelog (#132 )

2025-11-01 15:09:29 -05:00

IMPLEMENTATION_SUMMARY.md

feat(E4): Dark/light theme with system preference detection (#236 )

2026-02-09 23:14:11 -06:00

INSPECTOR_IMPLEMENTATION.md

feat(E4): Build rich node inspector panel with slide-in UI and detailed stats (#241 )

2026-02-11 09:46:05 -06:00

keyboard-shortcuts-help-overlay.png

Add keyboard navigation and shortcuts for accessibility (#240 )

2026-02-11 02:40:37 -06:00

LOAD_TESTING_IMPLEMENTATION.md

Add k6 load testing infrastructure with 4 test scenarios and endpoint-specific SLOs (#244 )

2026-02-11 17:28:48 -06:00

Makefile

Add k6 load testing infrastructure with 4 test scenarios and endpoint-specific SLOs (#244 )

2026-02-11 17:28:48 -06:00

OCTREE_IMPLEMENTATION.md

Implement octree spatial index for O(log n) raycasting and frustum culling (#214 )

2026-02-08 21:59:08 -06:00

PAGINATION_SUMMARY.md

Implement cursor-based pagination for graph nodes API (#232 )

2026-02-09 18:18:21 -06:00

PERFORMANCE_HUD_IMPLEMENTATION.md

feat(E1): Add real-time performance HUD with <1% overhead (#145 ) (#216 )

2026-02-08 22:01:05 -06:00

PERFORMANCE_HUD_VISUAL_MOCKUP.md

feat(E1): Add real-time performance HUD with <1% overhead (#145 ) (#216 )

2026-02-08 22:01:05 -06:00

PHYSICS_STABILIZATION_SUMMARY.md

Stabilize physics simulation with auto-scaling and bounds enforcement (#209 )

2026-02-08 15:16:56 -06:00

README.md

Add k6 load testing infrastructure with 4 test scenarios and endpoint-specific SLOs (#244 )

2026-02-11 17:28:48 -06:00

UI_CHANGES_PHYSICS_AUTOTUNE.md

Stabilize physics simulation with auto-scaling and bounds enforcement (#209 )

2026-02-08 15:16:56 -06:00

VERSION

Add release process: versioning, tags, and changelog (#132 )

2025-11-01 15:09:29 -05:00

README.md

Reddit Cluster Map

Collect, analyze, and visualize relationships between Reddit communities and users as an interactive 3D network graph.

Changelog • Contributing • Releases

🧠 What it does

Crawls subreddits for posts and comments (OAuth-authenticated; globally rate limited).
Stores normalized data in PostgreSQL.
Precomputes a graph (nodes + links) based on shared participation and activity, with an optional detailed content graph (posts/comments).
Serves the graph at /api/graph for the React frontend to render in multiple visualization modes:
- 3D Graph: Interactive WebGL visualization
- 2D Graph: SVG-based force-directed layout with drag & pan
- Dashboard: Statistical overview and analytics
- Communities: Automated community detection using the Louvain algorithm

🧱 Architecture

Backend (Go)
- API server: backend/cmd/server
- Crawler: backend/cmd/crawler
- Precalculation: backend/cmd/precalculate
- Data access via sqlc: SQL in backend/internal/queries/*.sql → generated in backend/internal/db
Database: PostgreSQL
Frontend (Vite + React 3D): frontend/ (graph viewer)
Monitoring: Prometheus + Grafana for metrics and dashboards

See docs/overview.md for the full system picture and data flow.

🚀 Quick start

For full setup (Docker, env vars, seeding a crawl), see docs/setup.md. For CI/CD pipeline and Docker image publishing, see docs/CI-CD.md.

Common dev tasks from backend/:

Setup environment file:
- make setup (creates .env from .env.example)
Regenerate sqlc after editing SQL in backend/internal/queries/*.sql:
- make sqlc (alias: make generate)
Run the one-shot graph precalc:
- make precalculate
Run tests:
- go test ./...

For New Developers

Clone and setup:

git clone https://github.com/subculture-collective/reddit-cluster-map.git
cd reddit-cluster-map/backend
make setup  # Creates .env and checks tools

Configure backend/.env with your Reddit OAuth credentials and database password

Start services:

docker compose up -d --build
make migrate-up-local

(Optional) Seed sample data and run smoke tests:
```
make seed
make smoke-test
```

See the Developer Guide for detailed workflows, testing, and best practices.

Documentation

Getting Started

Setup Guide - Complete setup with Docker Compose, migrations, environment variables
Developer Guide - Development workflows, Makefile targets, testing, and troubleshooting
Contributing Guide - How to contribute, coding standards, PR guidelines

Architecture & Design

Architecture Overview - System architecture with diagrams, data flow, component interactions
Overview - High-level system design and data flow

Operations

Runbooks - Operational procedures: backup/restore, maintenance, troubleshooting
Monitoring Guide - Metrics, Prometheus, Grafana dashboards, and alerts
Data Integrity Guide - Database integrity checks and maintenance

Features & APIs

API Documentation - Core API endpoints and usage
Community API - Community aggregation endpoints (supernodes and subgraphs)
Community Detection - Louvain algorithm implementation
Visualization Modes - 3D, 2D, dashboard, and community views

Advanced Topics

Performance Documentation - Query optimization, benchmarking, and scaling
Performance Profiling Guide - Runtime profiling, benchmarks, and optimization
Performance Analysis - Performance review and optimization recommendations
Load Testing Guide - k6-based load testing for API performance validation
OAuth Token Management - Token refresh, credential rotation
Crawler Resilience - Rate limiting, retries, circuit breakers
Security Guide - Security features and best practices
Security Audit Summary - Quick reference for security testing and auditing
Security Audit Guide - Comprehensive security auditing and penetration testing procedures
Penetration Testing Checklist - Detailed penetration testing checklist
CI/CD Pipeline - Continuous integration and deployment

Common Development Tasks

From backend/, run make help to see all available targets. Key ones:

make generate - Regenerate sqlc code after editing SQL
make precalculate - Run graph precalculation
make test - Run all tests
make benchmark - Run Go benchmark tests
make benchmark-graph - Benchmark graph query performance
make performance-baseline - Collect comprehensive performance baseline
make profile-cpu - Collect CPU profile (requires ENABLE_PROFILING=true)
make profile-memory - Collect memory profile (requires ENABLE_PROFILING=true)
make loadtest - Run k6 load tests (smoke, load, stress, soak)
make loadtest-smoke - Quick smoke test (30s)
make loadtest-load - Load test with 50 VUs (5min)
make profile-all - Collect all profiles (CPU, memory, goroutines)
make integrity-check - Run data integrity checks
make integrity-clean - Clean up data integrity issues
make lint - Check code formatting and run go vet
make fmt - Auto-format Go code
make smoke-test - Run API health checks
make seed - Populate database with sample data

🔌 API surface

GET /api/graph?max_nodes=20000&max_links=50000
- Returns { nodes, links }. Results are cached for ~60s and capped by max_nodes/max_links using a stable weighting.
- Prefers precalculated tables, falls back to legacy JSON when empty.
GET /api/communities?max_nodes=100&max_links=500&with_positions=true
- Returns aggregated community supernodes and inter-community weighted links.
- Communities detected via server-side Louvain algorithm during precalculation.
GET /api/communities/{id}?max_nodes=10000&max_links=50000
- Returns the full subgraph (all nodes and links) for a specific community.
POST /api/crawl { "subreddit": "AskReddit" }
Additional resource endpoints exist without /api prefix: /subreddits, /users, /posts, /comments, /jobs.

See docs/api.md and docs/api-communities.md for details.

📊 Monitoring and Analytics

The project includes comprehensive monitoring with Prometheus and Grafana:

Metrics endpoint: GET /metrics - Prometheus format metrics
Prometheus: http://localhost:9090 - Metrics collection and querying
Grafana: http://localhost:3000 - Dashboards and visualizations (default: admin/admin)

Key Metrics

Crawl metrics: Job throughput, success/failure rates, posts/comments processed
API metrics: Request rates, response times (p50/p95/p99), error rates
Graph metrics: Node/link counts by type, precalculation duration
Database metrics: Operation durations, error rates
System health: Circuit breaker status, rate limiting pressure

Alerts

Pre-configured alerts for:

High API error rates (>5%)
High crawler error rates (>10%)
Slow queries (p95 > 2s)
Database errors
Circuit breaker trips
Stalled crawl jobs

See Monitoring Guide for complete metrics reference, dashboard setup, and PromQL examples.

⚙️ Configuration

Key environment variables (selected):

Security (see docs/SECURITY.md for details)
- ENABLE_RATE_LIMIT (true) — enable/disable rate limiting
- RATE_LIMIT_GLOBAL (100) — global requests per second
- RATE_LIMIT_GLOBAL_BURST (200) — global burst size
- RATE_LIMIT_PER_IP (10) — requests per second per IP
- RATE_LIMIT_PER_IP_BURST (20) — per-IP burst size
- CORS_ALLOWED_ORIGINS — comma-separated list of allowed CORS origins (default: localhost:5173,localhost:3000)
- ADMIN_API_TOKEN — bearer token for admin endpoints
Monitoring
- GRAFANA_ADMIN_PASSWORD (admin) — Grafana admin password
Reddit OAuth
- REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_REDIRECT_URI, REDDIT_SCOPES, REDDIT_USER_AGENT
HTTP / retries
- HTTP_MAX_RETRIES (default 3), HTTP_RETRY_BASE_MS (300), HTTP_TIMEOUT_MS (15000), LOG_HTTP_RETRIES (false)
- GRAPH_QUERY_TIMEOUT_MS (30000) — timeout for graph API queries
- DB_STATEMENT_TIMEOUT_MS (25000) — database statement timeout
Graph generation
- DETAILED_GRAPH (false) — include posts/comments
- POSTS_PER_SUB_IN_GRAPH (10), COMMENTS_PER_POST_IN_GRAPH (50)
- MAX_AUTHOR_CONTENT_LINKS (3) — cross-link content by the same author across subreddits
- DISABLE_API_GRAPH_JOB (false) — disable hourly background job in API
- PRECALC_CLEAR_ON_START (false) — when true, clears graph tables at precalc start
- Batching/progress (applied at runtime in precalc):
  - GRAPH_NODE_BATCH_SIZE (1000)
  - GRAPH_LINK_BATCH_SIZE (2000)
  - GRAPH_PROGRESS_INTERVAL (10000)
Crawler scheduling
- STALE_DAYS (30), RESET_CRAWLING_AFTER_MIN (15)

🖥 Frontend

Vite + React with multiple visualization modes:
- 3D graph with react-force-graph-3d and interactive minimap for navigation
- 2D graph with D3.js force simulation
- Statistics dashboard
- Community detection view with Louvain algorithm
VITE_API_URL defaults to /api
Optional client caps: VITE_MAX_RENDER_NODES, VITE_MAX_RENDER_LINKS

See docs/visualization-modes.md and docs/community-detection.md for feature details. See frontend/README.md for local dev and env hints.

Languages

TypeScript 48%

Go 44.1%

Shell 3.4%

JavaScript 1.9%

PLpgSQL 1.2%

Other 1.4%