Copilot 8550686a57 Implement visual regression testing for graph rendering (#246)
* Initial plan

* feat(E5): Add visual regression test infrastructure

- Extended Playwright config with screenshot comparison settings
- Added visual test scripts to package.json
- Created deterministic fixture generator with fixed positions
- Implemented comprehensive visual.spec.ts test suite
- Updated .gitignore to preserve baseline screenshots

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

* feat(E5): Add baseline screenshots and fix localStorage init

- Generated 11 baseline screenshots for visual regression tests
- Fixed theme setting to use addInitScript before page.goto()
- All visual tests now pass consistently
- Tests cover: empty/small/large graphs, 3D/2D/dashboard views, light/dark themes, zoomed states

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

* feat(E5): Add CI integration and comprehensive documentation for visual tests

- Added visual-regression job to CI workflow
- Job runs after frontend tests and uploads diff reports on failure
- Updated TESTING.md with complete visual regression testing guide
- Documented test coverage, configuration, troubleshooting, and best practices
- All 11 visual tests pass consistently

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

* docs(E5): Add e2e README for test structure documentation

- Created comprehensive README for e2e directory
- Documents test types, directory structure, and usage
- Explains when and how to update baselines
- Includes CI integration details
- All 14 e2e tests pass (3 smoke + 11 visual)

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>
2026-02-12 02:47:33 -06:00

Reddit Cluster Map

CI Version

Collect, analyze, and visualize relationships between Reddit communities and users as an interactive 3D network graph.

ChangelogContributingReleases


🧠 What it does

  • Crawls subreddits for posts and comments (OAuth-authenticated; globally rate limited).
  • Stores normalized data in PostgreSQL.
  • Precomputes a graph (nodes + links) based on shared participation and activity, with an optional detailed content graph (posts/comments).
  • Serves the graph at /api/graph for the React frontend to render in multiple visualization modes:
    • 3D Graph: Interactive WebGL visualization
    • 2D Graph: SVG-based force-directed layout with drag & pan
    • Dashboard: Statistical overview and analytics
    • Communities: Automated community detection using the Louvain algorithm

🧱 Architecture

  • Backend (Go)
    • API server: backend/cmd/server
    • Crawler: backend/cmd/crawler
    • Precalculation: backend/cmd/precalculate
    • Data access via sqlc: SQL in backend/internal/queries/*.sql → generated in backend/internal/db
  • Database: PostgreSQL
  • Frontend (Vite + React 3D): frontend/ (graph viewer)
  • Monitoring: Prometheus + Grafana for metrics and dashboards

See docs/overview.md for the full system picture and data flow.


🚀 Quick start

For full setup (Docker, env vars, seeding a crawl), see docs/setup.md. For CI/CD pipeline and Docker image publishing, see docs/CI-CD.md.

Common dev tasks from backend/:

  • Setup environment file:
    • make setup (creates .env from .env.example)
  • Regenerate sqlc after editing SQL in backend/internal/queries/*.sql:
    • make sqlc (alias: make generate)
  • Run the one-shot graph precalc:
    • make precalculate
  • Run tests:
    • go test ./...

For New Developers

  1. Clone and setup:

    git clone https://github.com/subculture-collective/reddit-cluster-map.git
    cd reddit-cluster-map/backend
    make setup  # Creates .env and checks tools
    
  2. Configure backend/.env with your Reddit OAuth credentials and database password

  3. Start services:

    docker compose up -d --build
    make migrate-up-local
    
  4. (Optional) Seed sample data and run smoke tests:

    make seed
    make smoke-test
    

See the Developer Guide for detailed workflows, testing, and best practices.

Documentation

Getting Started

  • Setup Guide - Complete setup with Docker Compose, migrations, environment variables
  • Developer Guide - Development workflows, Makefile targets, testing, and troubleshooting
  • Contributing Guide - How to contribute, coding standards, PR guidelines

Architecture & Design

Operations

Features & APIs

Advanced Topics

Common Development Tasks

From backend/, run make help to see all available targets. Key ones:

  • make generate - Regenerate sqlc code after editing SQL
  • make precalculate - Run graph precalculation
  • make test - Run all tests
  • make benchmark - Run Go benchmark tests
  • make benchmark-graph - Benchmark graph query performance
  • make performance-baseline - Collect comprehensive performance baseline
  • make profile-cpu - Collect CPU profile (requires ENABLE_PROFILING=true)
  • make profile-memory - Collect memory profile (requires ENABLE_PROFILING=true)
  • make loadtest - Run k6 load tests (smoke, load, stress, soak)
  • make loadtest-smoke - Quick smoke test (30s)
  • make loadtest-load - Load test with 50 VUs (5min)
  • make profile-all - Collect all profiles (CPU, memory, goroutines)
  • make integrity-check - Run data integrity checks
  • make integrity-clean - Clean up data integrity issues
  • make lint - Check code formatting and run go vet
  • make fmt - Auto-format Go code
  • make smoke-test - Run API health checks
  • make seed - Populate database with sample data

🔌 API surface

  • GET /api/graph?max_nodes=20000&max_links=50000
    • Returns { nodes, links }. Results are cached for ~60s and capped by max_nodes/max_links using a stable weighting.
    • Prefers precalculated tables, falls back to legacy JSON when empty.
  • GET /api/communities?max_nodes=100&max_links=500&with_positions=true
    • Returns aggregated community supernodes and inter-community weighted links.
    • Communities detected via server-side Louvain algorithm during precalculation.
  • GET /api/communities/{id}?max_nodes=10000&max_links=50000
    • Returns the full subgraph (all nodes and links) for a specific community.
  • POST /api/crawl { "subreddit": "AskReddit" }
  • Additional resource endpoints exist without /api prefix: /subreddits, /users, /posts, /comments, /jobs.

See docs/api.md and docs/api-communities.md for details.


📊 Monitoring and Analytics

The project includes comprehensive monitoring with Prometheus and Grafana:

  • Metrics endpoint: GET /metrics - Prometheus format metrics
  • Prometheus: http://localhost:9090 - Metrics collection and querying
  • Grafana: http://localhost:3000 - Dashboards and visualizations (default: admin/admin)

Key Metrics

  • Crawl metrics: Job throughput, success/failure rates, posts/comments processed
  • API metrics: Request rates, response times (p50/p95/p99), error rates
  • Graph metrics: Node/link counts by type, precalculation duration
  • Database metrics: Operation durations, error rates
  • System health: Circuit breaker status, rate limiting pressure

Alerts

Pre-configured alerts for:

  • High API error rates (>5%)
  • High crawler error rates (>10%)
  • Slow queries (p95 > 2s)
  • Database errors
  • Circuit breaker trips
  • Stalled crawl jobs

See Monitoring Guide for complete metrics reference, dashboard setup, and PromQL examples.


⚙️ Configuration

Key environment variables (selected):

  • Security (see docs/SECURITY.md for details)
    • ENABLE_RATE_LIMIT (true) — enable/disable rate limiting
    • RATE_LIMIT_GLOBAL (100) — global requests per second
    • RATE_LIMIT_GLOBAL_BURST (200) — global burst size
    • RATE_LIMIT_PER_IP (10) — requests per second per IP
    • RATE_LIMIT_PER_IP_BURST (20) — per-IP burst size
    • CORS_ALLOWED_ORIGINS — comma-separated list of allowed CORS origins (default: localhost:5173,localhost:3000)
    • ADMIN_API_TOKEN — bearer token for admin endpoints
  • Monitoring
    • GRAFANA_ADMIN_PASSWORD (admin) — Grafana admin password
  • Reddit OAuth
    • REDDIT_CLIENT_ID, REDDIT_CLIENT_SECRET, REDDIT_REDIRECT_URI, REDDIT_SCOPES, REDDIT_USER_AGENT
  • HTTP / retries
    • HTTP_MAX_RETRIES (default 3), HTTP_RETRY_BASE_MS (300), HTTP_TIMEOUT_MS (15000), LOG_HTTP_RETRIES (false)
    • GRAPH_QUERY_TIMEOUT_MS (30000) — timeout for graph API queries
    • DB_STATEMENT_TIMEOUT_MS (25000) — database statement timeout
  • Graph generation
    • DETAILED_GRAPH (false) — include posts/comments
    • POSTS_PER_SUB_IN_GRAPH (10), COMMENTS_PER_POST_IN_GRAPH (50)
    • MAX_AUTHOR_CONTENT_LINKS (3) — cross-link content by the same author across subreddits
    • DISABLE_API_GRAPH_JOB (false) — disable hourly background job in API
    • PRECALC_CLEAR_ON_START (false) — when true, clears graph tables at precalc start
    • Batching/progress (applied at runtime in precalc):
      • GRAPH_NODE_BATCH_SIZE (1000)
      • GRAPH_LINK_BATCH_SIZE (2000)
      • GRAPH_PROGRESS_INTERVAL (10000)
  • Crawler scheduling
    • STALE_DAYS (30), RESET_CRAWLING_AFTER_MIN (15)

🖥 Frontend

  • Vite + React with multiple visualization modes:
    • 3D graph with react-force-graph-3d and interactive minimap for navigation
    • 2D graph with D3.js force simulation
    • Statistics dashboard
    • Community detection view with Louvain algorithm
  • VITE_API_URL defaults to /api
  • Optional client caps: VITE_MAX_RENDER_NODES, VITE_MAX_RENDER_LINKS

See docs/visualization-modes.md and docs/community-detection.md for feature details. See frontend/README.md for local dev and env hints.

Description
A full-stack application for collecting, analyzing, and visualizing Reddit communities and their user interactions as network graphs.
Readme 26 MiB
Languages
TypeScript 48%
Go 44.1%
Shell 3.4%
JavaScript 1.9%
PLpgSQL 1.2%
Other 1.4%