A searchable database and network analysis tool for the DOJ Epstein Files release. Built to make public records accessible, cross-referenced, and analyzable.

What This Does

Entity Extraction — Extracts names, organizations, locations, and dates from 4,055 DOJ documents
Relationship Mapping — Builds a graph of connections based on document co-occurrence
Layer Classification — Classifies entities by degree of separation from Jeffrey Epstein
Cross-Reference Engine — Fuzzy-matches entities against:
- PPP loan data (SBA)
- FEC campaign contributions
- Federal grant recipients
Pattern Detection Agent — AI agent specialized in finding non-obvious connections

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Frontend (React + Tailwind)               │
│  • Search Interface  • Network Visualization  • Document Viewer │
└─────────────────────────┬───────────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────────┐
│                        API Server (Go)                           │
│  • REST Endpoints  • Full-text Search  • Graph Queries          │
└─────────────────────────┬───────────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────────┐
│                        Data Layer                                │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────────┐ │
│  │  PostgreSQL  │  │  Neo4j       │  │  Typesense/Meilisearch │ │
│  │  Entities    │  │  Graph       │  │  Full-text Search      │ │
│  │  Documents   │  │  Relations   │  │                        │ │
│  │  Cross-refs  │  │              │  │                        │ │
│  └──────────────┘  └──────────────┘  └────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                          │
┌─────────────────────────▼───────────────────────────────────────┐
│                     Extraction Pipeline (TypeScript)             │
│  • OCR Processing  • NER Extraction  • Relationship Inference   │
└─────────────────────────────────────────────────────────────────┘

Tech Stack

Component	Technology	Rationale
Frontend	React + Tailwind + Vite	Fast, modern, type-safe
API	Go (Fiber/Echo)	Performance for graph queries
Primary DB	PostgreSQL	Structured data, JSONB, full-text
Graph DB	Neo4j	Relationship traversal at scale
Search	Typesense	Fast fuzzy search, typo-tolerant
Extraction	TypeScript + LLM	Entity extraction, deduplication
Pattern Agent	OpenClaw sub-agent	AI-driven connection discovery

Data Sources

Primary: DOJ Epstein Files

4,055 documents (EFTA00000001 through EFTA00008528)
1.77M lines of OCR text
157GB raw data (PDFs, images, scans)
Source: https://www.justice.gov/epstein

Cross-Reference Datasets

PPP Loans: SBA FOIA data (https://data.sba.gov/dataset/ppp-foia)
FEC Contributions: Federal Election Commission (https://www.fec.gov/data/)
Federal Grants: USASpending.gov (https://www.usaspending.gov/download_center/custom_award_data)

Layer Classification

Layer	Definition	Example
L0	Jeffrey Epstein himself	—
L1	Direct associates (named in documents with Epstein)	Ghislaine Maxwell
L2	One degree removed (connected to L1 but not directly to Epstein)	—
L3	Two degrees removed	—

Getting Started

Prerequisites

Docker & Docker Compose
Node.js 20+
Go 1.21+
PostgreSQL 16+ (or use Docker)
Neo4j 5+ (or use Docker)

Quick Start

# Clone the repo
git clone https://github.com/subculture-collective/epstein-db.git
cd epstein-db

# Start databases
docker-compose up -d

# Install dependencies
npm install
cd api && go mod download && cd ..

# Run extraction pipeline (requires OpenAI-compatible API)
cp .env.example .env
# Edit .env with your API keys

npm run extract

# Start the API server
cd api && go run . &

# Start the frontend
npm run dev

Project Structure

epstein-db/
├── api/                    # Go API server
│   ├── cmd/                # Entry points
│   ├── internal/           # Internal packages
│   │   ├── handlers/       # HTTP handlers
│   │   ├── db/             # Database access
│   │   ├── graph/          # Neo4j operations
│   │   └── search/         # Typesense operations
│   └── pkg/                # Public packages
│
├── extraction/             # TypeScript extraction pipeline
│   ├── src/
│   │   ├── ocr/            # OCR processing
│   │   ├── ner/            # Named Entity Recognition
│   │   ├── dedup/          # Entity deduplication
│   │   └── cross-ref/      # Cross-reference matching
│   └── scripts/            # Pipeline scripts
│
├── frontend/               # React frontend
│   ├── src/
│   │   ├── components/     # UI components
│   │   ├── pages/          # Route pages
│   │   ├── hooks/          # Custom hooks
│   │   └── api/            # API client
│   └── public/
│
├── agents/                 # AI agents
│   └── pattern-finder/     # Connection discovery agent
│
├── data/                   # Data directory (gitignored)
│   ├── raw/                # Symlink to DataSources
│   ├── processed/          # Extracted entities/relations
│   ├── crossref/           # PPP, FEC, grants data
│   └── exports/            # Generated exports
│
├── docker-compose.yml      # Database services
├── schema/                 # Database schemas
│   ├── postgres/           # SQL migrations
│   └── neo4j/              # Cypher constraints
│
└── docs/                   # Documentation
    ├── ARCHITECTURE.md
    ├── DATA_MODEL.md
    └── CONTRIBUTING.md

Roadmap

Phase 1: Foundation ✅

Repository setup
Database schema design
Docker compose for databases
Basic extraction pipeline

Phase 2: Entity Extraction

OCR text ingestion
Named Entity Recognition (NER)
Entity deduplication (LLM-assisted)
Document-entity relationships

Phase 3: Graph Construction

Neo4j schema
Co-occurrence relationship building
Layer classification algorithm
Graph API endpoints

Phase 4: Cross-Reference

PPP loan data ingestion
FEC contribution data ingestion
Federal grants data ingestion
Fuzzy matching engine

Phase 5: Frontend

Search interface
Network visualization (D3/Force-Graph)
Document viewer
Entity detail pages

Phase 6: Pattern Agent

Agent architecture design
Connection hypothesis generation
Validation pipeline
Report generation

Contributing

This is an open research project. Contributions welcome:

Entity extraction improvements
Fuzzy matching algorithms
UI/UX improvements
Additional cross-reference datasets
Pattern detection strategies

License

MIT License. The code is open source. The documents are public records.

Disclaimer

This is an independent research project. We make no representations about the completeness or accuracy of the analysis. This tool surfaces connections — it does not assert guilt, criminality, or wrongdoing.

Languages

TypeScript 54.1%

Go 25.5%

PLpgSQL 13.6%

Cypher 4%

CSS 1.6%

Other 1.2%