Initial commit: Epstein Files Database project structure

- PostgreSQL schema for documents, entities, relationships, cross-refs - Neo4j schema for graph relationships - TypeScript extraction pipeline (OCR, NER, deduplication) - Go API server (Fiber) with full REST endpoints - React + Tailwind frontend with network visualization - Pattern finder agent for connection discovery - Docker compose for databases (Postgres, Neo4j, Typesense) - Cross-reference matching for PPP loans, FEC, federal grants
2026-02-02 14:54:00 -06:00
commit f30c25e79f
33 changed files with 4353 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,66 @@
+# Data sources - too large for git
+DataSources/
+
+# Build outputs
+dist/
+build/
+.next/
+out/
+
+# Dependencies
+node_modules/
+vendor/
+
+# Environment
+.env
+.env.local
+.env.*.local
+
+# Go
+*.exe
+*.dll
+*.so
+*.dylib
+bin/
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+.venv/
+venv/
+env/
+*.egg-info/
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Database files (large, generated)
+*.db
+*.sqlite
+*.sqlite3
+
+# Logs
+*.log
+logs/
+
+# Temporary files
+tmp/
+temp/
+.cache/
+
+# Generated data (can be recreated)
+data/processed/
+data/embeddings/
+data/exports/
+
+# Keep config examples
+!*.example
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Subcult
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,217 @@
+# Epstein Files Database
+
+A searchable database and network analysis tool for the DOJ Epstein Files release. Built to make public records accessible, cross-referenced, and analyzable.
+
+## What This Does
+
+1. **Entity Extraction** — Extracts names, organizations, locations, and dates from 4,055 DOJ documents
+2. **Relationship Mapping** — Builds a graph of connections based on document co-occurrence
+3. **Layer Classification** — Classifies entities by degree of separation from Jeffrey Epstein
+4. **Cross-Reference Engine** — Fuzzy-matches entities against:
+   - PPP loan data (SBA)
+   - FEC campaign contributions
+   - Federal grant recipients
+5. **Pattern Detection Agent** — AI agent specialized in finding non-obvious connections
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        Frontend (React + Tailwind)               │
+│  • Search Interface  • Network Visualization  • Document Viewer │
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+┌─────────────────────────▼───────────────────────────────────────┐
+│                        API Server (Go)                           │
+│  • REST Endpoints  • Full-text Search  • Graph Queries          │
+└─────────────────────────┬───────────────────────────────────────┘
+                          │
+┌─────────────────────────▼───────────────────────────────────────┐
+│                        Data Layer                                │
+│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────────┐ │
+│  │  PostgreSQL  │  │  Neo4j       │  │  Typesense/Meilisearch │ │
+│  │  Entities    │  │  Graph       │  │  Full-text Search      │ │
+│  │  Documents   │  │  Relations   │  │                        │ │
+│  │  Cross-refs  │  │              │  │                        │ │
+│  └──────────────┘  └──────────────┘  └────────────────────────┘ │
+└─────────────────────────────────────────────────────────────────┘
+                          │
+┌─────────────────────────▼───────────────────────────────────────┐
+│                     Extraction Pipeline (TypeScript)             │
+│  • OCR Processing  • NER Extraction  • Relationship Inference   │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Tech Stack
+
+| Component | Technology | Rationale |
+|-----------|------------|-----------|
+| Frontend | React + Tailwind + Vite | Fast, modern, type-safe |
+| API | Go (Fiber/Echo) | Performance for graph queries |
+| Primary DB | PostgreSQL | Structured data, JSONB, full-text |
+| Graph DB | Neo4j | Relationship traversal at scale |
+| Search | Typesense | Fast fuzzy search, typo-tolerant |
+| Extraction | TypeScript + LLM | Entity extraction, deduplication |
+| Pattern Agent | OpenClaw sub-agent | AI-driven connection discovery |
+
+## Data Sources
+
+### Primary: DOJ Epstein Files
+- **4,055 documents** (EFTA00000001 through EFTA00008528)
+- **1.77M lines** of OCR text
+- **157GB** raw data (PDFs, images, scans)
+- Source: https://www.justice.gov/epstein
+
+### Cross-Reference Datasets
+- **PPP Loans**: SBA FOIA data (https://data.sba.gov/dataset/ppp-foia)
+- **FEC Contributions**: Federal Election Commission (https://www.fec.gov/data/)
+- **Federal Grants**: USASpending.gov (https://www.usaspending.gov/download_center/custom_award_data)
+
+## Layer Classification
+
+| Layer | Definition | Example |
+|-------|------------|---------|
+| **L0** | Jeffrey Epstein himself | — |
+| **L1** | Direct associates (named in documents with Epstein) | Ghislaine Maxwell |
+| **L2** | One degree removed (connected to L1 but not directly to Epstein) | — |
+| **L3** | Two degrees removed | — |
+
+## Getting Started
+
+### Prerequisites
+- Docker & Docker Compose
+- Node.js 20+
+- Go 1.21+
+- PostgreSQL 16+ (or use Docker)
+- Neo4j 5+ (or use Docker)
+
+### Quick Start
+
+```bash
+# Clone the repo
+git clone https://github.com/subculture-collective/epstein-db.git
+cd epstein-db
+
+# Start databases
+docker-compose up -d
+
+# Install dependencies
+npm install
+cd api && go mod download && cd ..
+
+# Run extraction pipeline (requires OpenAI-compatible API)
+cp .env.example .env
+# Edit .env with your API keys
+
+npm run extract
+
+# Start the API server
+cd api && go run . &
+
+# Start the frontend
+npm run dev
+```
+
+## Project Structure
+
+```
+epstein-db/
+├── api/                    # Go API server
+│   ├── cmd/                # Entry points
+│   ├── internal/           # Internal packages
+│   │   ├── handlers/       # HTTP handlers
+│   │   ├── db/             # Database access
+│   │   ├── graph/          # Neo4j operations
+│   │   └── search/         # Typesense operations
+│   └── pkg/                # Public packages
+│
+├── extraction/             # TypeScript extraction pipeline
+│   ├── src/
+│   │   ├── ocr/            # OCR processing
+│   │   ├── ner/            # Named Entity Recognition
+│   │   ├── dedup/          # Entity deduplication
+│   │   └── cross-ref/      # Cross-reference matching
+│   └── scripts/            # Pipeline scripts
+│
+├── frontend/               # React frontend
+│   ├── src/
+│   │   ├── components/     # UI components
+│   │   ├── pages/          # Route pages
+│   │   ├── hooks/          # Custom hooks
+│   │   └── api/            # API client
+│   └── public/
+│
+├── agents/                 # AI agents
+│   └── pattern-finder/     # Connection discovery agent
+│
+├── data/                   # Data directory (gitignored)
+│   ├── raw/                # Symlink to DataSources
+│   ├── processed/          # Extracted entities/relations
+│   ├── crossref/           # PPP, FEC, grants data
+│   └── exports/            # Generated exports
+│
+├── docker-compose.yml      # Database services
+├── schema/                 # Database schemas
+│   ├── postgres/           # SQL migrations
+│   └── neo4j/              # Cypher constraints
+│
+└── docs/                   # Documentation
+    ├── ARCHITECTURE.md
+    ├── DATA_MODEL.md
+    └── CONTRIBUTING.md
+```
+
+## Roadmap
+
+### Phase 1: Foundation ✅
+- [x] Repository setup
+- [ ] Database schema design
+- [ ] Docker compose for databases
+- [ ] Basic extraction pipeline
+
+### Phase 2: Entity Extraction
+- [ ] OCR text ingestion
+- [ ] Named Entity Recognition (NER)
+- [ ] Entity deduplication (LLM-assisted)
+- [ ] Document-entity relationships
+
+### Phase 3: Graph Construction
+- [ ] Neo4j schema
+- [ ] Co-occurrence relationship building
+- [ ] Layer classification algorithm
+- [ ] Graph API endpoints
+
+### Phase 4: Cross-Reference
+- [ ] PPP loan data ingestion
+- [ ] FEC contribution data ingestion
+- [ ] Federal grants data ingestion
+- [ ] Fuzzy matching engine
+
+### Phase 5: Frontend
+- [ ] Search interface
+- [ ] Network visualization (D3/Force-Graph)
+- [ ] Document viewer
+- [ ] Entity detail pages
+
+### Phase 6: Pattern Agent
+- [ ] Agent architecture design
+- [ ] Connection hypothesis generation
+- [ ] Validation pipeline
+- [ ] Report generation
+
+## Contributing
+
+This is an open research project. Contributions welcome:
+- Entity extraction improvements
+- Fuzzy matching algorithms
+- UI/UX improvements
+- Additional cross-reference datasets
+- Pattern detection strategies
+
+## License
+
+MIT License. The code is open source. The documents are public records.
+
+## Disclaimer
+
+This is an independent research project. We make no representations about the completeness or accuracy of the analysis. This tool surfaces connections — it does not assert guilt, criminality, or wrongdoing.
--- a/agents/pattern-finder/README.md
+++ b/agents/pattern-finder/README.md
@@ -0,0 +1,113 @@
+# Pattern Finder Agent
+
+An AI agent specialized in discovering non-obvious connections, patterns, and relationships within the Epstein Files database.
+
+## Purpose
+
+While the extraction pipeline identifies explicit entities and relationships, the Pattern Finder looks for:
+
+1. **Indirect Connections** — Entities that appear in similar contexts but are never directly linked
+2. **Temporal Patterns** — Activities that cluster around specific dates or events
+3. **Financial Flows** — Money movement patterns across entities
+4. **Network Anomalies** — Unusually dense or sparse connection patterns
+5. **Cross-Reference Insights** — What PPP/FEC/Grants matches reveal about entities
+
+## How It Works
+
+The agent runs periodically (or on-demand) and:
+
+1. **Samples the Graph** — Pulls subgraphs around high-degree or interesting entities
+2. **Generates Hypotheses** — Uses LLM to identify potential patterns
+3. **Validates Hypotheses** — Checks evidence in the actual documents
+4. **Reports Findings** — Stores validated patterns with evidence chains
+
+## Agent Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                     Pattern Finder Agent                         │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  1. Sampling Module                                              │
+│     • Random walk from high-degree nodes                         │
+│     • Temporal window sampling                                   │
+│     • Cross-reference focused sampling                           │
+│                                                                  │
+│  2. Hypothesis Generator (LLM)                                   │
+│     • Pattern recognition prompts                                │
+│     • Anomaly detection prompts                                  │
+│     • Connection inference prompts                               │
+│                                                                  │
+│  3. Evidence Validator                                           │
+│     • Document retrieval                                         │
+│     • Citation extraction                                        │
+│     • Confidence scoring                                         │
+│                                                                  │
+│  4. Report Generator                                             │
+│     • Pattern summary                                            │
+│     • Evidence chain                                             │
+│     • Visualization data                                         │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Pattern Types
+
+### Financial Patterns
+- Money flows between entities
+- Unusual transaction timing
+- Shell company connections
+- Donation clustering
+
+### Travel Patterns
+- Co-location events
+- Flight log correlations
+- Property connections
+- Event attendance
+
+### Organizational Patterns
+- Board memberships
+- Foundation connections
+- Employment relationships
+- Legal representation
+
+### Temporal Patterns
+- Activity clustering around dates
+- Gaps in documentation
+- Correlated timelines
+
+## Usage
+
+```bash
+# Run a pattern discovery session
+npm run agent:pattern-finder
+
+# Focus on specific entity
+npm run agent:pattern-finder -- --entity "Ghislaine Maxwell"
+
+# Focus on date range
+npm run agent:pattern-finder -- --from "2005-01-01" --to "2010-12-31"
+
+# Focus on pattern type
+npm run agent:pattern-finder -- --type financial
+```
+
+## Output
+
+Patterns are stored in the `pattern_findings` table with:
+- Title and description
+- Involved entities
+- Evidence (documents, relationships)
+- Confidence score
+- Status (hypothesis, validated, rejected)
+
+## Integration with OpenClaw
+
+This agent can be spawned as a sub-agent from OpenClaw:
+
+```typescript
+sessions_spawn({
+  task: "Analyze the network around Les Wexner for financial patterns",
+  label: "pattern-finder-wexner",
+})
+```
--- a/agents/pattern-finder/agent.ts
+++ b/agents/pattern-finder/agent.ts
@@ -0,0 +1,315 @@
+/**
+ * Pattern Finder Agent
+ * 
+ * Discovers non-obvious connections and patterns in the Epstein Files database.
+ */
+
+import Anthropic from '@anthropic-ai/sdk';
+import { z } from 'zod';
+import pg from 'pg';
+
+const { Pool } = pg;
+
+// ============================================================================
+// Configuration
+// ============================================================================
+
+const config = {
+  DATABASE_URL: process.env.DATABASE_URL || 'postgresql://epstein:epstein_dev@localhost:5432/epstein',
+  ANTHROPIC_API_KEY: process.env.ANTHROPIC_API_KEY || '',
+  LLM_MODEL: process.env.LLM_MODEL || 'claude-sonnet-4-20250514',
+};
+
+const pool = new Pool({ connectionString: config.DATABASE_URL });
+const anthropic = new Anthropic({ apiKey: config.ANTHROPIC_API_KEY });
+
+// ============================================================================
+// Types
+// ============================================================================
+
+interface Entity {
+  id: number;
+  canonicalName: string;
+  entityType: string;
+  layer: number;
+  documentCount: number;
+  connectionCount: number;
+  pppMatches: any[];
+  fecMatches: any[];
+  grantsMatches: any[];
+}
+
+interface Connection {
+  entity1: string;
+  entity2: string;
+  sharedDocs: number;
+  documentIds: string[];
+}
+
+interface PatternHypothesis {
+  title: string;
+  description: string;
+  patternType: string;
+  entityNames: string[];
+  evidence: string[];
+  confidence: number;
+}
+
+// ============================================================================
+// Sampling Functions
+// ============================================================================
+
+async function getHighDegreeEntities(limit: number = 50): Promise<Entity[]> {
+  const result = await pool.query(`
+    SELECT 
+      id, canonical_name, entity_type, layer, 
+      document_count, connection_count,
+      ppp_matches, fec_matches, grants_matches
+    FROM entities
+    WHERE entity_type IN ('person', 'organization')
+    ORDER BY connection_count DESC
+    LIMIT $1
+  `, [limit]);
+  
+  return result.rows.map(row => ({
+    id: row.id,
+    canonicalName: row.canonical_name,
+    entityType: row.entity_type,
+    layer: row.layer || 0,
+    documentCount: row.document_count || 0,
+    connectionCount: row.connection_count || 0,
+    pppMatches: row.ppp_matches || [],
+    fecMatches: row.fec_matches || [],
+    grantsMatches: row.grants_matches || [],
+  }));
+}
+
+async function getEntityConnections(entityId: number, limit: number = 100): Promise<Connection[]> {
+  const result = await pool.query(`
+    SELECT 
+      e1.canonical_name AS entity1,
+      e2.canonical_name AS entity2,
+      COUNT(DISTINCT d.id) AS shared_docs,
+      array_agg(DISTINCT d.doc_id) AS document_ids
+    FROM document_entities de1
+    JOIN document_entities de2 ON de1.document_id = de2.document_id AND de1.entity_id != de2.entity_id
+    JOIN entities e1 ON de1.entity_id = e1.id
+    JOIN entities e2 ON de2.entity_id = e2.id
+    JOIN documents d ON de1.document_id = d.id
+    WHERE de1.entity_id = $1
+    GROUP BY e1.canonical_name, e2.canonical_name
+    ORDER BY shared_docs DESC
+    LIMIT $2
+  `, [entityId, limit]);
+  
+  return result.rows.map(row => ({
+    entity1: row.entity1,
+    entity2: row.entity2,
+    sharedDocs: parseInt(row.shared_docs),
+    documentIds: row.document_ids,
+  }));
+}
+
+async function getEntitiesWithCrossRefMatches(): Promise<Entity[]> {
+  const result = await pool.query(`
+    SELECT 
+      id, canonical_name, entity_type, layer,
+      document_count, connection_count,
+      ppp_matches, fec_matches, grants_matches
+    FROM entities
+    WHERE 
+      (ppp_matches IS NOT NULL AND jsonb_array_length(ppp_matches) > 0)
+      OR (fec_matches IS NOT NULL AND jsonb_array_length(fec_matches) > 0)
+      OR (grants_matches IS NOT NULL AND jsonb_array_length(grants_matches) > 0)
+    ORDER BY connection_count DESC
+    LIMIT 100
+  `);
+  
+  return result.rows.map(row => ({
+    id: row.id,
+    canonicalName: row.canonical_name,
+    entityType: row.entity_type,
+    layer: row.layer || 0,
+    documentCount: row.document_count || 0,
+    connectionCount: row.connection_count || 0,
+    pppMatches: row.ppp_matches || [],
+    fecMatches: row.fec_matches || [],
+    grantsMatches: row.grants_matches || [],
+  }));
+}
+
+// ============================================================================
+// Pattern Detection
+// ============================================================================
+
+const PATTERN_SYSTEM_PROMPT = `You are an investigative analyst specializing in network analysis and pattern detection. You're analyzing data from the Jeffrey Epstein case documents.
+
+Your task is to identify non-obvious patterns, connections, and anomalies that might warrant further investigation.
+
+Focus on:
+1. Financial patterns (money flows, unusual transactions, timing)
+2. Organizational patterns (shared board memberships, foundations, legal representation)
+3. Temporal patterns (activities clustering around dates, gaps in documentation)
+4. Network anomalies (unusually dense connections, unexpected bridges between groups)
+5. Cross-reference insights (what PPP loans, FEC contributions, or federal grants reveal)
+
+Be specific and cite evidence. Generate hypotheses that can be validated with document review.
+
+IMPORTANT: You are surfacing patterns for investigation, not asserting guilt or wrongdoing.`;
+
+async function generatePatternHypotheses(
+  entities: Entity[],
+  connections: Connection[]
+): Promise<PatternHypothesis[]> {
+  const entitySummaries = entities.map(e => ({
+    name: e.canonicalName,
+    type: e.entityType,
+    layer: e.layer,
+    docs: e.documentCount,
+    connections: e.connectionCount,
+    hasPPP: e.pppMatches.length > 0,
+    hasFEC: e.fecMatches.length > 0,
+    hasGrants: e.grantsMatches.length > 0,
+  }));
+
+  const connectionSummaries = connections.slice(0, 50).map(c => ({
+    pair: `${c.entity1} ↔ ${c.entity2}`,
+    sharedDocs: c.sharedDocs,
+  }));
+
+  const prompt = `Analyze this network data and identify potential patterns worth investigating.
+
+ENTITIES (${entities.length} total, showing key attributes):
+${JSON.stringify(entitySummaries, null, 2)}
+
+TOP CONNECTIONS:
+${JSON.stringify(connectionSummaries, null, 2)}
+
+Generate 3-5 pattern hypotheses. For each, provide:
+1. A specific, descriptive title
+2. What the pattern suggests
+3. Which entities are involved
+4. What evidence supports this hypothesis
+5. Confidence level (0-1)
+
+Return JSON array:
+[
+  {
+    "title": "Pattern Title",
+    "description": "What this pattern suggests and why it's notable",
+    "patternType": "financial|organizational|temporal|network|crossref",
+    "entityNames": ["Entity1", "Entity2"],
+    "evidence": ["Evidence point 1", "Evidence point 2"],
+    "confidence": 0.7
+  }
+]
+
+Return ONLY valid JSON.`;
+
+  const response = await anthropic.messages.create({
+    model: config.LLM_MODEL,
+    max_tokens: 4096,
+    system: PATTERN_SYSTEM_PROMPT,
+    messages: [{ role: 'user', content: prompt }],
+  });
+
+  const content = response.content[0];
+  if (content.type !== 'text') {
+    throw new Error('Unexpected response type');
+  }
+
+  const jsonMatch = content.text.match(/\[[\s\S]*\]/);
+  if (!jsonMatch) {
+    console.error('No JSON found:', content.text);
+    return [];
+  }
+
+  return JSON.parse(jsonMatch[0]);
+}
+
+// ============================================================================
+// Save Patterns
+// ============================================================================
+
+async function savePattern(pattern: PatternHypothesis): Promise<number> {
+  // Get entity IDs
+  const entityResult = await pool.query(`
+    SELECT id FROM entities WHERE canonical_name = ANY($1)
+  `, [pattern.entityNames]);
+  
+  const entityIds = entityResult.rows.map(r => r.id);
+
+  const result = await pool.query(`
+    INSERT INTO pattern_findings 
+      (title, description, pattern_type, entity_ids, evidence, confidence, status)
+    VALUES ($1, $2, $3, $4, $5, $6, 'hypothesis')
+    RETURNING id
+  `, [
+    pattern.title,
+    pattern.description,
+    pattern.patternType,
+    entityIds,
+    JSON.stringify({
+      entityNames: pattern.entityNames,
+      evidencePoints: pattern.evidence,
+    }),
+    pattern.confidence,
+  ]);
+
+  return result.rows[0].id;
+}
+
+// ============================================================================
+// Main
+// ============================================================================
+
+async function main() {
+  console.log('🔎 Pattern Finder Agent starting...\n');
+
+  // Get high-degree entities
+  console.log('📊 Sampling high-degree entities...');
+  const highDegree = await getHighDegreeEntities(50);
+  console.log(`   Found ${highDegree.length} high-degree entities`);
+
+  // Get entities with cross-reference matches
+  console.log('📊 Sampling entities with cross-reference matches...');
+  const crossRef = await getEntitiesWithCrossRefMatches();
+  console.log(`   Found ${crossRef.length} entities with PPP/FEC/Grants matches`);
+
+  // Get connections for top entities
+  console.log('📊 Sampling connections...');
+  const allConnections: Connection[] = [];
+  for (const entity of highDegree.slice(0, 10)) {
+    const connections = await getEntityConnections(entity.id, 50);
+    allConnections.push(...connections);
+  }
+  console.log(`   Found ${allConnections.length} connections`);
+
+  // Combine entities (deduplicate)
+  const allEntities = [...highDegree, ...crossRef];
+  const uniqueEntities = Array.from(
+    new Map(allEntities.map(e => [e.id, e])).values()
+  );
+
+  // Generate pattern hypotheses
+  console.log('\n🧠 Generating pattern hypotheses...');
+  const patterns = await generatePatternHypotheses(uniqueEntities, allConnections);
+  console.log(`   Generated ${patterns.length} hypotheses`);
+
+  // Save patterns
+  console.log('\n💾 Saving patterns to database...');
+  for (const pattern of patterns) {
+    const id = await savePattern(pattern);
+    console.log(`   ✓ Saved: ${pattern.title} (ID: ${id})`);
+  }
+
+  console.log('\n✅ Pattern Finder complete!');
+  console.log(`   Patterns discovered: ${patterns.length}`);
+
+  await pool.end();
+}
+
+main().catch((error) => {
+  console.error('Fatal error:', error);
+  process.exit(1);
+});
--- a/api/cmd/server/main.go
+++ b/api/cmd/server/main.go
@@ -0,0 +1,105 @@
+package main
+
+import (
+	"context"
+	"log"
+	"os"
+	"os/signal"
+	"syscall"
+
+	"github.com/gofiber/fiber/v2"
+	"github.com/gofiber/fiber/v2/middleware/cors"
+	"github.com/gofiber/fiber/v2/middleware/logger"
+	"github.com/gofiber/fiber/v2/middleware/recover"
+	"github.com/joho/godotenv"
+
+	"github.com/subculture-collective/epstein-db/api/internal/db"
+	"github.com/subculture-collective/epstein-db/api/internal/handlers"
+)
+
+func main() {
+	// Load .env file
+	if err := godotenv.Load(); err != nil {
+		log.Println("No .env file found, using environment variables")
+	}
+
+	// Initialize database connection
+	if err := db.Initialize(context.Background()); err != nil {
+		log.Fatalf("Failed to initialize database: %v", err)
+	}
+	defer db.Close()
+
+	// Create Fiber app
+	app := fiber.New(fiber.Config{
+		AppName: "Epstein Files API",
+	})
+
+	// Middleware
+	app.Use(recover.New())
+	app.Use(logger.New())
+	app.Use(cors.New(cors.Config{
+		AllowOrigins: "*",
+		AllowMethods: "GET,POST,PUT,DELETE,OPTIONS",
+		AllowHeaders: "Origin, Content-Type, Accept, Authorization",
+	}))
+
+	// Routes
+	api := app.Group("/api")
+
+	// Stats
+	api.Get("/stats", handlers.GetStats)
+
+	// Entities
+	api.Get("/entities", handlers.SearchEntities)
+	api.Get("/entities/:id", handlers.GetEntity)
+	api.Get("/entities/:id/connections", handlers.GetEntityConnections)
+	api.Get("/entities/:id/documents", handlers.GetEntityDocuments)
+
+	// Documents
+	api.Get("/documents", handlers.ListDocuments)
+	api.Get("/documents/:id", handlers.GetDocument)
+	api.Get("/documents/:id/text", handlers.GetDocumentText)
+	api.Get("/documents/:id/entities", handlers.GetDocumentEntities)
+
+	// Graph/Network
+	api.Get("/network", handlers.GetNetwork)
+	api.Get("/network/layers", handlers.GetNetworkByLayer)
+
+	// Cross-references
+	api.Get("/crossref/ppp", handlers.SearchPPP)
+	api.Get("/crossref/fec", handlers.SearchFEC)
+	api.Get("/crossref/grants", handlers.SearchGrants)
+
+	// Patterns
+	api.Get("/patterns", handlers.ListPatterns)
+	api.Get("/patterns/:id", handlers.GetPattern)
+
+	// Search
+	api.Get("/search", handlers.FullTextSearch)
+
+	// Health check
+	app.Get("/health", func(c *fiber.Ctx) error {
+		return c.JSON(fiber.Map{"status": "ok"})
+	})
+
+	// Get port from environment
+	port := os.Getenv("PORT")
+	if port == "" {
+		port = "3001"
+	}
+
+	// Graceful shutdown
+	go func() {
+		sigChan := make(chan os.Signal, 1)
+		signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
+		<-sigChan
+		log.Println("Shutting down...")
+		app.Shutdown()
+	}()
+
+	// Start server
+	log.Printf("Starting server on port %s", port)
+	if err := app.Listen(":" + port); err != nil {
+		log.Fatalf("Server error: %v", err)
+	}
+}
--- a/api/go.mod
+++ b/api/go.mod
@@ -0,0 +1,31 @@
+module github.com/subculture-collective/epstein-db/api
+
+go 1.21
+
+require (
+	github.com/gofiber/fiber/v2 v2.52.4
+	github.com/jackc/pgx/v5 v5.5.5
+	github.com/neo4j/neo4j-go-driver/v5 v5.19.0
+	github.com/typesense/typesense-go v1.1.0
+	github.com/joho/godotenv v1.5.1
+)
+
+require (
+	github.com/andybalholm/brotli v1.1.0 // indirect
+	github.com/google/uuid v1.6.0 // indirect
+	github.com/jackc/pgpassfile v1.0.0 // indirect
+	github.com/jackc/pgservicefile v0.0.0-20231201235250-de7065d80cb9 // indirect
+	github.com/jackc/puddle/v2 v2.2.1 // indirect
+	github.com/klauspost/compress v1.17.8 // indirect
+	github.com/mattn/go-colorable v0.1.13 // indirect
+	github.com/mattn/go-isatty v0.0.20 // indirect
+	github.com/mattn/go-runewidth v0.0.15 // indirect
+	github.com/rivo/uniseg v0.4.7 // indirect
+	github.com/valyala/bytebufferpool v1.0.0 // indirect
+	github.com/valyala/fasthttp v1.52.0 // indirect
+	github.com/valyala/tcplisten v1.0.0 // indirect
+	golang.org/x/crypto v0.22.0 // indirect
+	golang.org/x/sync v0.7.0 // indirect
+	golang.org/x/sys v0.19.0 // indirect
+	golang.org/x/text v0.14.0 // indirect
+)
--- a/api/internal/db/db.go
+++ b/api/internal/db/db.go
@@ -0,0 +1,35 @@
+package db
+
+import (
+	"context"
+	"os"
+
+	"github.com/jackc/pgx/v5/pgxpool"
+)
+
+var pool *pgxpool.Pool
+
+func Initialize(ctx context.Context) error {
+	connString := os.Getenv("DATABASE_URL")
+	if connString == "" {
+		connString = "postgresql://epstein:epstein_dev@localhost:5432/epstein"
+	}
+
+	var err error
+	pool, err = pgxpool.New(ctx, connString)
+	if err != nil {
+		return err
+	}
+
+	return pool.Ping(ctx)
+}
+
+func Close() {
+	if pool != nil {
+		pool.Close()
+	}
+}
+
+func Pool() *pgxpool.Pool {
+	return pool
+}
--- a/api/internal/handlers/crossref.go
+++ b/api/internal/handlers/crossref.go
@@ -0,0 +1,202 @@
+package handlers
+
+import (
+	"context"
+	"strconv"
+
+	"github.com/gofiber/fiber/v2"
+	"github.com/subculture-collective/epstein-db/api/internal/db"
+)
+
+// SearchPPP searches PPP loan data
+func SearchPPP(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	query := c.Query("q", "")
+	limitStr := c.Query("limit", "50")
+	limit, _ := strconv.Atoi(limitStr)
+	if limit > 200 {
+		limit = 200
+	}
+
+	rows, err := pool.Query(ctx, `
+		SELECT id, borrower_name, borrower_city, borrower_state, 
+			   loan_amount, forgiveness_amount, lender, date_approved,
+			   similarity(borrower_name, $1) AS score
+		FROM ppp_loans
+		WHERE $1 = '' OR borrower_name % $1 OR borrower_name ILIKE '%' || $1 || '%'
+		ORDER BY 
+			CASE WHEN $1 != '' THEN similarity(borrower_name, $1) ELSE 0 END DESC,
+			loan_amount DESC NULLS LAST
+		LIMIT $2
+	`, query, limit)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var results []fiber.Map
+	for rows.Next() {
+		var id int
+		var name string
+		var city, state, lender *string
+		var loanAmount, forgivenessAmount *float64
+		var dateApproved *string
+		var score float64
+
+		if err := rows.Scan(&id, &name, &city, &state, &loanAmount, 
+			&forgivenessAmount, &lender, &dateApproved, &score); err != nil {
+			continue
+		}
+
+		results = append(results, fiber.Map{
+			"id":                id,
+			"borrowerName":      name,
+			"borrowerCity":      city,
+			"borrowerState":     state,
+			"loanAmount":        loanAmount,
+			"forgivenessAmount": forgivenessAmount,
+			"lender":            lender,
+			"dateApproved":      dateApproved,
+			"matchScore":        score,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"results": results,
+		"count":   len(results),
+	})
+}
+
+// SearchFEC searches FEC contribution data
+func SearchFEC(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	query := c.Query("q", "")
+	candidate := c.Query("candidate", "")
+	limitStr := c.Query("limit", "50")
+	limit, _ := strconv.Atoi(limitStr)
+	if limit > 200 {
+		limit = 200
+	}
+
+	rows, err := pool.Query(ctx, `
+		SELECT id, contributor_name, contributor_city, contributor_state,
+			   contributor_employer, contributor_occupation,
+			   candidate_name, committee_name, amount, contribution_date,
+			   similarity(contributor_name, $1) AS score
+		FROM fec_contributions
+		WHERE ($1 = '' OR contributor_name % $1 OR contributor_name ILIKE '%' || $1 || '%')
+		  AND ($2 = '' OR candidate_name ILIKE '%' || $2 || '%')
+		ORDER BY 
+			CASE WHEN $1 != '' THEN similarity(contributor_name, $1) ELSE 0 END DESC,
+			amount DESC NULLS LAST
+		LIMIT $3
+	`, query, candidate, limit)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var results []fiber.Map
+	for rows.Next() {
+		var id int
+		var name string
+		var city, state, employer, occupation, candidateName, committeeName *string
+		var amount *float64
+		var contributionDate *string
+		var score float64
+
+		if err := rows.Scan(&id, &name, &city, &state, &employer, &occupation,
+			&candidateName, &committeeName, &amount, &contributionDate, &score); err != nil {
+			continue
+		}
+
+		results = append(results, fiber.Map{
+			"id":              id,
+			"contributorName": name,
+			"contributorCity": city,
+			"contributorState": state,
+			"employer":         employer,
+			"occupation":       occupation,
+			"candidateName":    candidateName,
+			"committeeName":    committeeName,
+			"amount":           amount,
+			"contributionDate": contributionDate,
+			"matchScore":       score,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"results": results,
+		"count":   len(results),
+	})
+}
+
+// SearchGrants searches federal grants data
+func SearchGrants(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	query := c.Query("q", "")
+	agency := c.Query("agency", "")
+	limitStr := c.Query("limit", "50")
+	limit, _ := strconv.Atoi(limitStr)
+	if limit > 200 {
+		limit = 200
+	}
+
+	rows, err := pool.Query(ctx, `
+		SELECT id, recipient_name, recipient_city, recipient_state,
+			   awarding_agency, funding_agency, award_amount, award_date,
+			   description, cfda_title,
+			   similarity(recipient_name, $1) AS score
+		FROM federal_grants
+		WHERE ($1 = '' OR recipient_name % $1 OR recipient_name ILIKE '%' || $1 || '%')
+		  AND ($2 = '' OR awarding_agency ILIKE '%' || $2 || '%')
+		ORDER BY 
+			CASE WHEN $1 != '' THEN similarity(recipient_name, $1) ELSE 0 END DESC,
+			award_amount DESC NULLS LAST
+		LIMIT $3
+	`, query, agency, limit)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var results []fiber.Map
+	for rows.Next() {
+		var id int
+		var name string
+		var city, state, awardingAgency, fundingAgency *string
+		var awardAmount *float64
+		var awardDate, description, cfdaTitle *string
+		var score float64
+
+		if err := rows.Scan(&id, &name, &city, &state, &awardingAgency, &fundingAgency,
+			&awardAmount, &awardDate, &description, &cfdaTitle, &score); err != nil {
+			continue
+		}
+
+		results = append(results, fiber.Map{
+			"id":             id,
+			"recipientName":  name,
+			"recipientCity":  city,
+			"recipientState": state,
+			"awardingAgency": awardingAgency,
+			"fundingAgency":  fundingAgency,
+			"awardAmount":    awardAmount,
+			"awardDate":      awardDate,
+			"description":    description,
+			"cfdaTitle":      cfdaTitle,
+			"matchScore":     score,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"results": results,
+		"count":   len(results),
+	})
+}
--- a/api/internal/handlers/documents.go
+++ b/api/internal/handlers/documents.go
@@ -0,0 +1,238 @@
+package handlers
+
+import (
+	"context"
+	"strconv"
+
+	"github.com/gofiber/fiber/v2"
+	"github.com/subculture-collective/epstein-db/api/internal/db"
+)
+
+// ListDocuments returns a paginated list of documents
+func ListDocuments(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	limitStr := c.Query("limit", "50")
+	limit, _ := strconv.Atoi(limitStr)
+	if limit > 200 {
+		limit = 200
+	}
+
+	offsetStr := c.Query("offset", "0")
+	offset, _ := strconv.Atoi(offsetStr)
+
+	docType := c.Query("type", "")
+	dataset := c.Query("dataset", "")
+
+	rows, err := pool.Query(ctx, `
+		SELECT id, doc_id, dataset_id, document_type, summary, date_earliest, date_latest
+		FROM documents
+		WHERE ($1 = '' OR document_type = $1)
+		  AND ($2 = '' OR dataset_id = $2::int)
+		ORDER BY doc_id
+		LIMIT $3 OFFSET $4
+	`, docType, dataset, limit, offset)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var documents []fiber.Map
+	for rows.Next() {
+		var id, datasetID int
+		var docID string
+		var docType, summary *string
+		var dateEarliest, dateLatest *string
+
+		if err := rows.Scan(&id, &docID, &datasetID, &docType, &summary, &dateEarliest, &dateLatest); err != nil {
+			continue
+		}
+
+		documents = append(documents, fiber.Map{
+			"id":           id,
+			"docId":        docID,
+			"datasetId":    datasetID,
+			"documentType": docType,
+			"summary":      summary,
+			"dateEarliest": dateEarliest,
+			"dateLatest":   dateLatest,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"documents": documents,
+		"count":     len(documents),
+		"offset":    offset,
+		"limit":     limit,
+	})
+}
+
+// GetDocument returns a single document by ID
+func GetDocument(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	id, err := strconv.Atoi(c.Params("id"))
+	if err != nil {
+		return c.Status(400).JSON(fiber.Map{"error": "invalid id"})
+	}
+
+	var doc struct {
+		ID              int     `json:"id"`
+		DocID           string  `json:"docId"`
+		DatasetID       int     `json:"datasetId"`
+		DocumentType    *string `json:"documentType"`
+		Summary         *string `json:"summary"`
+		DetailedSummary *string `json:"detailedSummary"`
+		DateEarliest    *string `json:"dateEarliest"`
+		DateLatest      *string `json:"dateLatest"`
+		ContentTags     []byte  `json:"contentTags"`
+		PageCount       *int    `json:"pageCount"`
+	}
+
+	err = pool.QueryRow(ctx, `
+		SELECT id, doc_id, dataset_id, document_type, summary, detailed_summary,
+			   date_earliest::text, date_latest::text, content_tags, page_count
+		FROM documents WHERE id = $1
+	`, id).Scan(
+		&doc.ID, &doc.DocID, &doc.DatasetID, &doc.DocumentType,
+		&doc.Summary, &doc.DetailedSummary, &doc.DateEarliest,
+		&doc.DateLatest, &doc.ContentTags, &doc.PageCount,
+	)
+
+	if err != nil {
+		return c.Status(404).JSON(fiber.Map{"error": "document not found"})
+	}
+
+	return c.JSON(doc)
+}
+
+// GetDocumentText returns the full text of a document
+func GetDocumentText(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	id, err := strconv.Atoi(c.Params("id"))
+	if err != nil {
+		return c.Status(400).JSON(fiber.Map{"error": "invalid id"})
+	}
+
+	var text *string
+	err = pool.QueryRow(ctx, "SELECT full_text FROM documents WHERE id = $1", id).Scan(&text)
+	if err != nil {
+		return c.Status(404).JSON(fiber.Map{"error": "document not found"})
+	}
+
+	return c.JSON(fiber.Map{
+		"id":   id,
+		"text": text,
+	})
+}
+
+// GetDocumentEntities returns entities mentioned in a document
+func GetDocumentEntities(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	id, err := strconv.Atoi(c.Params("id"))
+	if err != nil {
+		return c.Status(400).JSON(fiber.Map{"error": "invalid id"})
+	}
+
+	rows, err := pool.Query(ctx, `
+		SELECT e.id, e.canonical_name, e.entity_type, e.layer, de.mention_count
+		FROM entities e
+		JOIN document_entities de ON e.id = de.entity_id
+		WHERE de.document_id = $1
+		ORDER BY de.mention_count DESC
+	`, id)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var entities []fiber.Map
+	for rows.Next() {
+		var entityID int
+		var name, etype string
+		var layer *int
+		var mentions int
+
+		if err := rows.Scan(&entityID, &name, &etype, &layer, &mentions); err != nil {
+			continue
+		}
+
+		entities = append(entities, fiber.Map{
+			"id":            entityID,
+			"canonicalName": name,
+			"entityType":    etype,
+			"layer":         layer,
+			"mentionCount":  mentions,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"entities": entities,
+		"count":    len(entities),
+	})
+}
+
+// FullTextSearch searches document text
+func FullTextSearch(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	query := c.Query("q", "")
+	if query == "" {
+		return c.Status(400).JSON(fiber.Map{"error": "query required"})
+	}
+
+	limitStr := c.Query("limit", "20")
+	limit, _ := strconv.Atoi(limitStr)
+	if limit > 100 {
+		limit = 100
+	}
+
+	rows, err := pool.Query(ctx, `
+		SELECT id, doc_id, document_type, summary,
+			   ts_rank(to_tsvector('english', full_text), plainto_tsquery('english', $1)) AS rank,
+			   ts_headline('english', full_text, plainto_tsquery('english', $1), 
+			   			   'MaxWords=50, MinWords=20, StartSel=<mark>, StopSel=</mark>') AS snippet
+		FROM documents
+		WHERE to_tsvector('english', full_text) @@ plainto_tsquery('english', $1)
+		ORDER BY rank DESC
+		LIMIT $2
+	`, query, limit)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var results []fiber.Map
+	for rows.Next() {
+		var id int
+		var docID string
+		var docType, summary, snippet *string
+		var rank float64
+
+		if err := rows.Scan(&id, &docID, &docType, &summary, &rank, &snippet); err != nil {
+			continue
+		}
+
+		results = append(results, fiber.Map{
+			"id":           id,
+			"docId":        docID,
+			"documentType": docType,
+			"summary":      summary,
+			"rank":         rank,
+			"snippet":      snippet,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"results": results,
+		"count":   len(results),
+		"query":   query,
+	})
+}
--- a/api/internal/handlers/entities.go
+++ b/api/internal/handlers/entities.go
@@ -0,0 +1,250 @@
+package handlers
+
+import (
+	"context"
+	"strconv"
+
+	"github.com/gofiber/fiber/v2"
+	"github.com/subculture-collective/epstein-db/api/internal/db"
+)
+
+// GetStats returns database statistics
+func GetStats(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	var stats struct {
+		Documents    int64  `json:"documents"`
+		Entities     int64  `json:"entities"`
+		Triples      int64  `json:"triples"`
+		PPPLoans     int64  `json:"pppLoans"`
+		FECRecords   int64  `json:"fecRecords"`
+		Grants       int64  `json:"grants"`
+		Patterns     int64  `json:"patterns"`
+	}
+
+	pool.QueryRow(ctx, "SELECT COUNT(*) FROM documents").Scan(&stats.Documents)
+	pool.QueryRow(ctx, "SELECT COUNT(*) FROM entities").Scan(&stats.Entities)
+	pool.QueryRow(ctx, "SELECT COUNT(*) FROM triples").Scan(&stats.Triples)
+	pool.QueryRow(ctx, "SELECT COUNT(*) FROM ppp_loans").Scan(&stats.PPPLoans)
+	pool.QueryRow(ctx, "SELECT COUNT(*) FROM fec_contributions").Scan(&stats.FECRecords)
+	pool.QueryRow(ctx, "SELECT COUNT(*) FROM federal_grants").Scan(&stats.Grants)
+	pool.QueryRow(ctx, "SELECT COUNT(*) FROM pattern_findings").Scan(&stats.Patterns)
+
+	return c.JSON(stats)
+}
+
+// SearchEntities searches for entities by name
+func SearchEntities(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	query := c.Query("q", "")
+	limitStr := c.Query("limit", "20")
+	limit, _ := strconv.Atoi(limitStr)
+	if limit > 100 {
+		limit = 100
+	}
+
+	entityType := c.Query("type", "")
+	layer := c.Query("layer", "")
+
+	sqlQuery := `
+		SELECT id, canonical_name, entity_type, layer, document_count, connection_count
+		FROM entities
+		WHERE ($1 = '' OR canonical_name ILIKE '%' || $1 || '%' OR canonical_name % $1)
+		  AND ($2 = '' OR entity_type = $2::entity_type)
+		  AND ($3 = '' OR layer = $3::int)
+		ORDER BY 
+			CASE WHEN $1 != '' THEN similarity(canonical_name, $1) ELSE 0 END DESC,
+			document_count DESC
+		LIMIT $4
+	`
+
+	rows, err := pool.Query(ctx, sqlQuery, query, entityType, layer, limit)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var entities []fiber.Map
+	for rows.Next() {
+		var id int
+		var name, etype string
+		var layerVal, docCount, connCount *int
+
+		if err := rows.Scan(&id, &name, &etype, &layerVal, &docCount, &connCount); err != nil {
+			continue
+		}
+
+		entities = append(entities, fiber.Map{
+			"id":              id,
+			"canonicalName":   name,
+			"entityType":      etype,
+			"layer":           layerVal,
+			"documentCount":   docCount,
+			"connectionCount": connCount,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"entities": entities,
+		"count":    len(entities),
+	})
+}
+
+// GetEntity returns a single entity by ID
+func GetEntity(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	id, err := strconv.Atoi(c.Params("id"))
+	if err != nil {
+		return c.Status(400).JSON(fiber.Map{"error": "invalid id"})
+	}
+
+	var entity struct {
+		ID              int     `json:"id"`
+		CanonicalName   string  `json:"canonicalName"`
+		EntityType      string  `json:"entityType"`
+		Layer           *int    `json:"layer"`
+		Description     *string `json:"description"`
+		DocumentCount   *int    `json:"documentCount"`
+		ConnectionCount *int    `json:"connectionCount"`
+		Aliases         []byte  `json:"aliases"`
+		PPPMatches      []byte  `json:"pppMatches"`
+		FECMatches      []byte  `json:"fecMatches"`
+		GrantsMatches   []byte  `json:"grantsMatches"`
+	}
+
+	err = pool.QueryRow(ctx, `
+		SELECT id, canonical_name, entity_type, layer, description, 
+			   document_count, connection_count, aliases,
+			   ppp_matches, fec_matches, grants_matches
+		FROM entities WHERE id = $1
+	`, id).Scan(
+		&entity.ID, &entity.CanonicalName, &entity.EntityType,
+		&entity.Layer, &entity.Description, &entity.DocumentCount,
+		&entity.ConnectionCount, &entity.Aliases,
+		&entity.PPPMatches, &entity.FECMatches, &entity.GrantsMatches,
+	)
+
+	if err != nil {
+		return c.Status(404).JSON(fiber.Map{"error": "entity not found"})
+	}
+
+	return c.JSON(entity)
+}
+
+// GetEntityConnections returns entities connected to a given entity
+func GetEntityConnections(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	id, err := strconv.Atoi(c.Params("id"))
+	if err != nil {
+		return c.Status(400).JSON(fiber.Map{"error": "invalid id"})
+	}
+
+	limitStr := c.Query("limit", "50")
+	limit, _ := strconv.Atoi(limitStr)
+	if limit > 200 {
+		limit = 200
+	}
+
+	rows, err := pool.Query(ctx, `
+		SELECT 
+			e2.id, e2.canonical_name, e2.entity_type, e2.layer,
+			COUNT(DISTINCT d.id) AS shared_docs
+		FROM document_entities de1
+		JOIN document_entities de2 ON de1.document_id = de2.document_id AND de1.entity_id != de2.entity_id
+		JOIN entities e2 ON de2.entity_id = e2.id
+		JOIN documents d ON de1.document_id = d.id
+		WHERE de1.entity_id = $1
+		GROUP BY e2.id, e2.canonical_name, e2.entity_type, e2.layer
+		ORDER BY shared_docs DESC
+		LIMIT $2
+	`, id, limit)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var connections []fiber.Map
+	for rows.Next() {
+		var connID int
+		var name, etype string
+		var layerVal *int
+		var sharedDocs int
+
+		if err := rows.Scan(&connID, &name, &etype, &layerVal, &sharedDocs); err != nil {
+			continue
+		}
+
+		connections = append(connections, fiber.Map{
+			"id":            connID,
+			"canonicalName": name,
+			"entityType":    etype,
+			"layer":         layerVal,
+			"sharedDocs":    sharedDocs,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"connections": connections,
+		"count":       len(connections),
+	})
+}
+
+// GetEntityDocuments returns documents mentioning an entity
+func GetEntityDocuments(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	id, err := strconv.Atoi(c.Params("id"))
+	if err != nil {
+		return c.Status(400).JSON(fiber.Map{"error": "invalid id"})
+	}
+
+	limitStr := c.Query("limit", "50")
+	limit, _ := strconv.Atoi(limitStr)
+
+	rows, err := pool.Query(ctx, `
+		SELECT d.id, d.doc_id, d.document_type, d.summary, d.date_earliest, d.date_latest
+		FROM documents d
+		JOIN document_entities de ON d.id = de.document_id
+		WHERE de.entity_id = $1
+		ORDER BY d.date_earliest DESC NULLS LAST
+		LIMIT $2
+	`, id, limit)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var documents []fiber.Map
+	for rows.Next() {
+		var docID int
+		var docIdStr string
+		var docType, summary *string
+		var dateEarliest, dateLatest *string
+
+		if err := rows.Scan(&docID, &docIdStr, &docType, &summary, &dateEarliest, &dateLatest); err != nil {
+			continue
+		}
+
+		documents = append(documents, fiber.Map{
+			"id":           docID,
+			"docId":        docIdStr,
+			"documentType": docType,
+			"summary":      summary,
+			"dateEarliest": dateEarliest,
+			"dateLatest":   dateLatest,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"documents": documents,
+		"count":     len(documents),
+	})
+}
--- a/api/internal/handlers/network.go
+++ b/api/internal/handlers/network.go
@@ -0,0 +1,282 @@
+package handlers
+
+import (
+	"context"
+	"strconv"
+
+	"github.com/gofiber/fiber/v2"
+	"github.com/subculture-collective/epstein-db/api/internal/db"
+)
+
+// GetNetwork returns the relationship network for visualization
+func GetNetwork(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	limitStr := c.Query("limit", "1000")
+	limit, _ := strconv.Atoi(limitStr)
+	if limit > 10000 {
+		limit = 10000
+	}
+
+	minConnections := c.Query("minConnections", "2")
+	minConn, _ := strconv.Atoi(minConnections)
+
+	// Get nodes (entities with sufficient connections)
+	nodeRows, err := pool.Query(ctx, `
+		SELECT id, canonical_name, entity_type, layer, document_count, connection_count
+		FROM entities
+		WHERE entity_type IN ('person', 'organization')
+		  AND connection_count >= $1
+		ORDER BY connection_count DESC
+		LIMIT $2
+	`, minConn, limit)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer nodeRows.Close()
+
+	var nodes []fiber.Map
+	nodeIDs := make(map[int]bool)
+	
+	for nodeRows.Next() {
+		var id int
+		var name, etype string
+		var layer, docCount, connCount *int
+
+		if err := nodeRows.Scan(&id, &name, &etype, &layer, &docCount, &connCount); err != nil {
+			continue
+		}
+
+		nodeIDs[id] = true
+		nodes = append(nodes, fiber.Map{
+			"id":              id,
+			"canonicalName":   name,
+			"entityType":      etype,
+			"layer":           layer,
+			"documentCount":   docCount,
+			"connectionCount": connCount,
+		})
+	}
+
+	// Get edges (co-occurrence relationships)
+	edgeRows, err := pool.Query(ctx, `
+		SELECT 
+			de1.entity_id AS source,
+			de2.entity_id AS target,
+			COUNT(DISTINCT de1.document_id) AS weight
+		FROM document_entities de1
+		JOIN document_entities de2 ON de1.document_id = de2.document_id 
+			AND de1.entity_id < de2.entity_id
+		JOIN entities e1 ON de1.entity_id = e1.id
+		JOIN entities e2 ON de2.entity_id = e2.id
+		WHERE e1.entity_type IN ('person', 'organization')
+		  AND e2.entity_type IN ('person', 'organization')
+		  AND e1.connection_count >= $1
+		  AND e2.connection_count >= $1
+		GROUP BY de1.entity_id, de2.entity_id
+		HAVING COUNT(DISTINCT de1.document_id) >= 2
+		ORDER BY weight DESC
+		LIMIT $2
+	`, minConn, limit*3)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer edgeRows.Close()
+
+	var edges []fiber.Map
+	for edgeRows.Next() {
+		var source, target, weight int
+		if err := edgeRows.Scan(&source, &target, &weight); err != nil {
+			continue
+		}
+
+		// Only include edges where both nodes are in our node set
+		if nodeIDs[source] && nodeIDs[target] {
+			edges = append(edges, fiber.Map{
+				"source": source,
+				"target": target,
+				"weight": weight,
+			})
+		}
+	}
+
+	return c.JSON(fiber.Map{
+		"nodes": nodes,
+		"edges": edges,
+		"stats": fiber.Map{
+			"nodeCount": len(nodes),
+			"edgeCount": len(edges),
+		},
+	})
+}
+
+// GetNetworkByLayer returns entities organized by layer
+func GetNetworkByLayer(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	var layers []fiber.Map
+
+	for layer := 0; layer <= 3; layer++ {
+		rows, err := pool.Query(ctx, `
+			SELECT id, canonical_name, entity_type, document_count, connection_count
+			FROM entities
+			WHERE layer = $1 AND entity_type IN ('person', 'organization')
+			ORDER BY connection_count DESC
+			LIMIT 100
+		`, layer)
+		if err != nil {
+			continue
+		}
+
+		var entities []fiber.Map
+		for rows.Next() {
+			var id int
+			var name, etype string
+			var docCount, connCount *int
+
+			if err := rows.Scan(&id, &name, &etype, &docCount, &connCount); err != nil {
+				continue
+			}
+
+			entities = append(entities, fiber.Map{
+				"id":              id,
+				"canonicalName":   name,
+				"entityType":      etype,
+				"documentCount":   docCount,
+				"connectionCount": connCount,
+			})
+		}
+		rows.Close()
+
+		layers = append(layers, fiber.Map{
+			"layer":    layer,
+			"entities": entities,
+			"count":    len(entities),
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"layers": layers,
+	})
+}
+
+// ListPatterns returns discovered patterns
+func ListPatterns(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	status := c.Query("status", "")
+	patternType := c.Query("type", "")
+
+	rows, err := pool.Query(ctx, `
+		SELECT id, title, description, pattern_type, confidence, status, discovered_at
+		FROM pattern_findings
+		WHERE ($1 = '' OR status = $1)
+		  AND ($2 = '' OR pattern_type = $2)
+		ORDER BY discovered_at DESC
+		LIMIT 100
+	`, status, patternType)
+	if err != nil {
+		return c.Status(500).JSON(fiber.Map{"error": err.Error()})
+	}
+	defer rows.Close()
+
+	var patterns []fiber.Map
+	for rows.Next() {
+		var id int
+		var title, description, ptype, status string
+		var confidence *float64
+		var discoveredAt string
+
+		if err := rows.Scan(&id, &title, &description, &ptype, &confidence, &status, &discoveredAt); err != nil {
+			continue
+		}
+
+		patterns = append(patterns, fiber.Map{
+			"id":           id,
+			"title":        title,
+			"description":  description,
+			"patternType":  ptype,
+			"confidence":   confidence,
+			"status":       status,
+			"discoveredAt": discoveredAt,
+		})
+	}
+
+	return c.JSON(fiber.Map{
+		"patterns": patterns,
+		"count":    len(patterns),
+	})
+}
+
+// GetPattern returns a single pattern with full details
+func GetPattern(c *fiber.Ctx) error {
+	ctx := context.Background()
+	pool := db.Pool()
+
+	id, err := strconv.Atoi(c.Params("id"))
+	if err != nil {
+		return c.Status(400).JSON(fiber.Map{"error": "invalid id"})
+	}
+
+	var pattern struct {
+		ID           int     `json:"id"`
+		Title        string  `json:"title"`
+		Description  string  `json:"description"`
+		PatternType  string  `json:"patternType"`
+		EntityIDs    []int   `json:"entityIds"`
+		Evidence     []byte  `json:"evidence"`
+		Confidence   *float64 `json:"confidence"`
+		Status       string  `json:"status"`
+		Notes        *string `json:"notes"`
+		DiscoveredAt string  `json:"discoveredAt"`
+		DiscoveredBy string  `json:"discoveredBy"`
+	}
+
+	err = pool.QueryRow(ctx, `
+		SELECT id, title, description, pattern_type, entity_ids, evidence,
+			   confidence, status, notes, discovered_at, discovered_by
+		FROM pattern_findings WHERE id = $1
+	`, id).Scan(
+		&pattern.ID, &pattern.Title, &pattern.Description, &pattern.PatternType,
+		&pattern.EntityIDs, &pattern.Evidence, &pattern.Confidence,
+		&pattern.Status, &pattern.Notes, &pattern.DiscoveredAt, &pattern.DiscoveredBy,
+	)
+
+	if err != nil {
+		return c.Status(404).JSON(fiber.Map{"error": "pattern not found"})
+	}
+
+	// Get entity details
+	entityRows, err := pool.Query(ctx, `
+		SELECT id, canonical_name, entity_type, layer
+		FROM entities WHERE id = ANY($1)
+	`, pattern.EntityIDs)
+	if err == nil {
+		var entities []fiber.Map
+		for entityRows.Next() {
+			var eid int
+			var name, etype string
+			var layer *int
+			if err := entityRows.Scan(&eid, &name, &etype, &layer); err != nil {
+				continue
+			}
+			entities = append(entities, fiber.Map{
+				"id":            eid,
+				"canonicalName": name,
+				"entityType":    etype,
+				"layer":         layer,
+			})
+		}
+		entityRows.Close()
+
+		return c.JSON(fiber.Map{
+			"pattern":  pattern,
+			"entities": entities,
+		})
+	}
+
+	return c.JSON(pattern)
+}
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -0,0 +1,64 @@
+services:
+  postgres:
+    image: postgres:16-alpine
+    container_name: epstein-db-postgres
+    restart: unless-stopped
+    environment:
+      POSTGRES_USER: epstein
+      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-epstein_dev}
+      POSTGRES_DB: epstein
+    ports:
+      - "5432:5432"
+    volumes:
+      - postgres_data:/var/lib/postgresql/data
+      - ./schema/postgres:/docker-entrypoint-initdb.d:ro
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U epstein -d epstein"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  neo4j:
+    image: neo4j:5-community
+    container_name: epstein-db-neo4j
+    restart: unless-stopped
+    environment:
+      NEO4J_AUTH: neo4j/${NEO4J_PASSWORD:-neo4j_dev}
+      NEO4J_PLUGINS: '["apoc"]'
+      NEO4J_dbms_memory_heap_initial__size: 512m
+      NEO4J_dbms_memory_heap_max__size: 2G
+    ports:
+      - "7474:7474"  # HTTP
+      - "7687:7687"  # Bolt
+    volumes:
+      - neo4j_data:/data
+      - neo4j_logs:/logs
+    healthcheck:
+      test: ["CMD", "neo4j", "status"]
+      interval: 10s
+      timeout: 10s
+      retries: 5
+
+  typesense:
+    image: typesense/typesense:27.1
+    container_name: epstein-db-typesense
+    restart: unless-stopped
+    environment:
+      TYPESENSE_DATA_DIR: /data
+      TYPESENSE_API_KEY: ${TYPESENSE_API_KEY:-typesense_dev}
+      TYPESENSE_ENABLE_CORS: "true"
+    ports:
+      - "8108:8108"
+    volumes:
+      - typesense_data:/data
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8108/health"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+volumes:
+  postgres_data:
+  neo4j_data:
+  neo4j_logs:
+  typesense_data:
--- a/extraction/package.json
+++ b/extraction/package.json
@@ -0,0 +1,39 @@
+{
+  "name": "@epstein-db/extraction",
+  "version": "1.0.0",
+  "description": "Entity extraction pipeline for Epstein Files Database",
+  "type": "module",
+  "scripts": {
+    "build": "tsc",
+    "dev": "tsx watch src/index.ts",
+    "extract:documents": "tsx src/scripts/extract-documents.ts",
+    "extract:entities": "tsx src/scripts/extract-entities.ts",
+    "deduplicate": "tsx src/scripts/deduplicate.ts",
+    "load:crossref": "tsx src/scripts/load-crossref.ts",
+    "match:crossref": "tsx src/scripts/match-crossref.ts",
+    "calculate:layers": "tsx src/scripts/calculate-layers.ts",
+    "sync:neo4j": "tsx src/scripts/sync-neo4j.ts",
+    "pipeline": "npm run extract:documents && npm run extract:entities && npm run deduplicate && npm run calculate:layers && npm run sync:neo4j",
+    "typecheck": "tsc --noEmit"
+  },
+  "dependencies": {
+    "@anthropic-ai/sdk": "^0.24.0",
+    "@neondatabase/serverless": "^0.9.0",
+    "better-sqlite3": "^11.0.0",
+    "dotenv": "^16.4.5",
+    "drizzle-orm": "^0.30.0",
+    "neo4j-driver": "^5.19.0",
+    "openai": "^4.47.0",
+    "p-limit": "^5.0.0",
+    "pg": "^8.11.5",
+    "typesense": "^1.8.2",
+    "zod": "^3.23.0"
+  },
+  "devDependencies": {
+    "@types/better-sqlite3": "^7.6.10",
+    "@types/node": "^20.12.0",
+    "@types/pg": "^8.11.5",
+    "tsx": "^4.9.0",
+    "typescript": "^5.4.0"
+  }
+}
--- a/extraction/src/config.ts
+++ b/extraction/src/config.ts
@@ -0,0 +1,33 @@
+import { z } from 'zod';
+import dotenv from 'dotenv';
+
+dotenv.config();
+
+const configSchema = z.object({
+  // Database
+  DATABASE_URL: z.string().default('postgresql://epstein:epstein_dev@localhost:5432/epstein'),
+  NEO4J_URI: z.string().default('bolt://localhost:7687'),
+  NEO4J_USER: z.string().default('neo4j'),
+  NEO4J_PASSWORD: z.string().default('neo4j_dev'),
+  TYPESENSE_HOST: z.string().default('localhost'),
+  TYPESENSE_PORT: z.coerce.number().default(8108),
+  TYPESENSE_API_KEY: z.string().default('typesense_dev'),
+
+  // LLM
+  OPENAI_API_KEY: z.string().optional(),
+  OPENAI_BASE_URL: z.string().optional(),
+  ANTHROPIC_API_KEY: z.string().optional(),
+  LLM_MODEL: z.string().default('claude-sonnet-4-20250514'),
+  
+  // Extraction
+  DATA_DIR: z.string().default('../DataSources'),
+  BATCH_SIZE: z.coerce.number().default(10),
+  MAX_WORKERS: z.coerce.number().default(5),
+  
+  // Rate limiting
+  REQUESTS_PER_MINUTE: z.coerce.number().default(50),
+});
+
+export type Config = z.infer<typeof configSchema>;
+
+export const config = configSchema.parse(process.env);
--- a/extraction/src/db.ts
+++ b/extraction/src/db.ts
@@ -0,0 +1,248 @@
+import pg from 'pg';
+import { config } from './config.js';
+
+const { Pool } = pg;
+
+export const pool = new Pool({
+  connectionString: config.DATABASE_URL,
+});
+
+// Helper for transactions
+export async function withTransaction<T>(
+  fn: (client: pg.PoolClient) => Promise<T>
+): Promise<T> {
+  const client = await pool.connect();
+  try {
+    await client.query('BEGIN');
+    const result = await fn(client);
+    await client.query('COMMIT');
+    return result;
+  } catch (error) {
+    await client.query('ROLLBACK');
+    throw error;
+  } finally {
+    client.release();
+  }
+}
+
+// Document operations
+export async function insertDocument(doc: {
+  docId: string;
+  datasetId: number;
+  filePath?: string;
+  fullText?: string;
+  pageCount?: number;
+}): Promise<number> {
+  const result = await pool.query(
+    `INSERT INTO documents (doc_id, dataset_id, file_path, full_text, page_count)
+     VALUES ($1, $2, $3, $4, $5)
+     ON CONFLICT (doc_id) DO UPDATE SET
+       full_text = COALESCE(EXCLUDED.full_text, documents.full_text),
+       updated_at = NOW()
+     RETURNING id`,
+    [doc.docId, doc.datasetId, doc.filePath, doc.fullText, doc.pageCount]
+  );
+  return result.rows[0].id;
+}
+
+export async function updateDocumentAnalysis(
+  docId: string,
+  analysis: {
+    summary: string;
+    detailedSummary: string;
+    documentType: string;
+    dateEarliest?: Date;
+    dateLatest?: Date;
+    contentTags: string[];
+  }
+): Promise<void> {
+  await pool.query(
+    `UPDATE documents SET
+       summary = $2,
+       detailed_summary = $3,
+       document_type = $4,
+       date_earliest = $5,
+       date_latest = $6,
+       content_tags = $7,
+       analysis_status = 'complete',
+       analyzed_at = NOW(),
+       updated_at = NOW()
+     WHERE doc_id = $1`,
+    [
+      docId,
+      analysis.summary,
+      analysis.detailedSummary,
+      analysis.documentType,
+      analysis.dateEarliest,
+      analysis.dateLatest,
+      JSON.stringify(analysis.contentTags),
+    ]
+  );
+}
+
+export async function getDocumentsPendingAnalysis(
+  limit: number = 100
+): Promise<Array<{ id: number; docId: string; fullText: string }>> {
+  const result = await pool.query(
+    `SELECT id, doc_id, full_text FROM documents
+     WHERE analysis_status = 'pending' AND full_text IS NOT NULL
+     LIMIT $1`,
+    [limit]
+  );
+  return result.rows.map((row) => ({
+    id: row.id,
+    docId: row.doc_id,
+    fullText: row.full_text,
+  }));
+}
+
+// Entity operations
+export async function upsertEntity(entity: {
+  canonicalName: string;
+  entityType: string;
+  aliases?: string[];
+  description?: string;
+}): Promise<number> {
+  const result = await pool.query(
+    `INSERT INTO entities (canonical_name, entity_type, aliases, description)
+     VALUES ($1, $2::entity_type, $3, $4)
+     ON CONFLICT (canonical_name, entity_type) DO UPDATE SET
+       aliases = COALESCE(
+         entities.aliases || EXCLUDED.aliases,
+         entities.aliases,
+         EXCLUDED.aliases
+       ),
+       updated_at = NOW()
+     RETURNING id`,
+    [
+      entity.canonicalName,
+      entity.entityType,
+      JSON.stringify(entity.aliases || []),
+      entity.description,
+    ]
+  );
+  return result.rows[0].id;
+}
+
+export async function linkEntityToDocument(
+  entityId: number,
+  documentId: number,
+  mentionCount: number = 1,
+  contextSnippet?: string
+): Promise<void> {
+  await pool.query(
+    `INSERT INTO document_entities (document_id, entity_id, mention_count, context_snippet)
+     VALUES ($1, $2, $3, $4)
+     ON CONFLICT (document_id, entity_id) DO UPDATE SET
+       mention_count = document_entities.mention_count + EXCLUDED.mention_count`,
+    [documentId, entityId, mentionCount, contextSnippet]
+  );
+}
+
+export async function insertTriple(triple: {
+  documentId: number;
+  subjectId: number;
+  predicate: string;
+  objectId: number;
+  locationId?: number;
+  timestamp?: Date;
+  explicitTopic?: string;
+  implicitTopic?: string;
+  tags?: string[];
+  sequenceOrder: number;
+}): Promise<number> {
+  const result = await pool.query(
+    `INSERT INTO triples 
+     (document_id, subject_id, predicate, object_id, location_id, timestamp, explicit_topic, implicit_topic, tags, sequence_order)
+     VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
+     RETURNING id`,
+    [
+      triple.documentId,
+      triple.subjectId,
+      triple.predicate,
+      triple.objectId,
+      triple.locationId,
+      triple.timestamp,
+      triple.explicitTopic,
+      triple.implicitTopic,
+      JSON.stringify(triple.tags || []),
+      triple.sequenceOrder,
+    ]
+  );
+  return result.rows[0].id;
+}
+
+// Layer calculation
+export async function calculateEntityLayers(): Promise<void> {
+  // Set Layer 1: entities that share documents with Epstein
+  await pool.query(`
+    WITH epstein AS (
+      SELECT id FROM entities WHERE canonical_name = 'Jeffrey Epstein' AND entity_type = 'person'
+    ),
+    epstein_docs AS (
+      SELECT DISTINCT document_id FROM document_entities WHERE entity_id = (SELECT id FROM epstein)
+    ),
+    layer1_entities AS (
+      SELECT DISTINCT entity_id FROM document_entities
+      WHERE document_id IN (SELECT document_id FROM epstein_docs)
+      AND entity_id != (SELECT id FROM epstein)
+    )
+    UPDATE entities SET layer = 1, updated_at = NOW()
+    WHERE id IN (SELECT entity_id FROM layer1_entities) AND layer IS NULL
+  `);
+
+  // Set Layer 2: entities that share documents with Layer 1 (but not with Epstein directly)
+  await pool.query(`
+    WITH layer1 AS (
+      SELECT id FROM entities WHERE layer = 1
+    ),
+    layer1_docs AS (
+      SELECT DISTINCT document_id FROM document_entities WHERE entity_id IN (SELECT id FROM layer1)
+    ),
+    layer2_candidates AS (
+      SELECT DISTINCT entity_id FROM document_entities
+      WHERE document_id IN (SELECT document_id FROM layer1_docs)
+    )
+    UPDATE entities SET layer = 2, updated_at = NOW()
+    WHERE id IN (SELECT entity_id FROM layer2_candidates) AND layer IS NULL
+  `);
+
+  // Set Layer 3: remaining entities
+  await pool.query(`
+    UPDATE entities SET layer = 3, updated_at = NOW() WHERE layer IS NULL
+  `);
+}
+
+// Search
+export async function searchEntities(
+  query: string,
+  limit: number = 20
+): Promise<
+  Array<{
+    id: number;
+    canonicalName: string;
+    entityType: string;
+    layer: number;
+    documentCount: number;
+  }>
+> {
+  const result = await pool.query(
+    `SELECT id, canonical_name, entity_type, layer, document_count
+     FROM entities
+     WHERE canonical_name ILIKE $1 OR canonical_name % $2
+     ORDER BY similarity(canonical_name, $2) DESC, document_count DESC
+     LIMIT $3`,
+    [`%${query}%`, query, limit]
+  );
+  return result.rows.map((row) => ({
+    id: row.id,
+    canonicalName: row.canonical_name,
+    entityType: row.entity_type,
+    layer: row.layer,
+    documentCount: row.document_count,
+  }));
+}
+
+export async function close(): Promise<void> {
+  await pool.end();
+}
--- a/extraction/src/ner/extractor.ts
+++ b/extraction/src/ner/extractor.ts
@@ -0,0 +1,208 @@
+import { z } from 'zod';
+import Anthropic from '@anthropic-ai/sdk';
+import { config } from '../config.js';
+
+// Initialize Anthropic client
+const anthropic = new Anthropic({
+  apiKey: config.ANTHROPIC_API_KEY,
+});
+
+// ============================================================================
+// SCHEMAS
+// ============================================================================
+
+export const EntitySchema = z.object({
+  name: z.string(),
+  type: z.enum(['person', 'organization', 'location', 'date', 'reference', 'financial']),
+  context: z.string().optional(),
+});
+
+export const TripleSchema = z.object({
+  subject: z.string(),
+  subjectType: z.enum(['person', 'organization', 'location']),
+  predicate: z.string(),
+  object: z.string(),
+  objectType: z.enum(['person', 'organization', 'location', 'date', 'reference', 'financial']),
+  location: z.string().optional(),
+  timestamp: z.string().optional(),
+  explicitTopic: z.string().optional(),
+  implicitTopic: z.string().optional(),
+  tags: z.array(z.string()).optional(),
+});
+
+export const DocumentAnalysisSchema = z.object({
+  summary: z.string(),
+  detailedSummary: z.string(),
+  documentType: z.string(),
+  dateEarliest: z.string().nullable(),
+  dateLatest: z.string().nullable(),
+  contentTags: z.array(z.string()),
+  entities: z.array(EntitySchema),
+  triples: z.array(TripleSchema),
+});
+
+export type Entity = z.infer<typeof EntitySchema>;
+export type Triple = z.infer<typeof TripleSchema>;
+export type DocumentAnalysis = z.infer<typeof DocumentAnalysisSchema>;
+
+// ============================================================================
+// EXTRACTION PROMPTS
+// ============================================================================
+
+const EXTRACTION_SYSTEM_PROMPT = `You are an expert document analyst specializing in legal documents, financial records, and correspondence. Your task is to extract structured information from documents related to the Jeffrey Epstein case.
+
+Extract the following:
+
+1. **Entities**: All people, organizations, locations, dates, document references, and financial amounts mentioned.
+2. **Relationships (Triples)**: Subject-Predicate-Object relationships between entities.
+3. **Document Analysis**: Summary, type classification, date range, and content tags.
+
+Be thorough but precise. If information is unclear or partially redacted, note what you can determine. Focus on factual extraction, not interpretation.
+
+IMPORTANT: 
+- Normalize names where possible (e.g., "J. Epstein" → "Jeffrey Epstein" if context confirms)
+- Include context snippets for important entities
+- Extract temporal information when available
+- Tag relationships with relevant categories (legal, financial, travel, social, etc.)`;
+
+const EXTRACTION_USER_PROMPT = (text: string) => `Analyze this document and extract structured information.
+
+<document>
+${text}
+</document>
+
+Respond with a JSON object matching this schema:
+{
+  "summary": "One sentence summary of the document",
+  "detailedSummary": "A paragraph explaining the document's content and significance",
+  "documentType": "Type of document (e.g., deposition, email, financial record, flight log, etc.)",
+  "dateEarliest": "YYYY-MM-DD or null if no dates",
+  "dateLatest": "YYYY-MM-DD or null if no dates",
+  "contentTags": ["tag1", "tag2", ...],
+  "entities": [
+    {"name": "Full Name", "type": "person|organization|location|date|reference|financial", "context": "brief context"}
+  ],
+  "triples": [
+    {
+      "subject": "Entity Name",
+      "subjectType": "person|organization|location",
+      "predicate": "action/relationship verb",
+      "object": "Entity Name",
+      "objectType": "person|organization|location|date|reference|financial",
+      "location": "where (optional)",
+      "timestamp": "YYYY-MM-DD (optional)",
+      "explicitTopic": "stated subject matter (optional)",
+      "implicitTopic": "inferred subject matter (optional)",
+      "tags": ["legal", "financial", "travel", etc.]
+    }
+  ]
+}
+
+Return ONLY valid JSON, no markdown or explanation.`;
+
+// ============================================================================
+// EXTRACTION FUNCTION
+// ============================================================================
+
+export async function extractFromDocument(
+  docId: string,
+  text: string
+): Promise<DocumentAnalysis> {
+  // Truncate very long documents
+  const maxChars = 100000;
+  const truncatedText = text.length > maxChars 
+    ? text.slice(0, maxChars) + '\n\n[TRUNCATED - document continues...]' 
+    : text;
+
+  const response = await anthropic.messages.create({
+    model: config.LLM_MODEL,
+    max_tokens: 8192,
+    system: EXTRACTION_SYSTEM_PROMPT,
+    messages: [
+      {
+        role: 'user',
+        content: EXTRACTION_USER_PROMPT(truncatedText),
+      },
+    ],
+  });
+
+  // Extract text content
+  const content = response.content[0];
+  if (content.type !== 'text') {
+    throw new Error(`Unexpected response type: ${content.type}`);
+  }
+
+  // Parse JSON
+  let parsed: unknown;
+  try {
+    // Try to extract JSON from the response (sometimes wrapped in markdown)
+    const jsonMatch = content.text.match(/\{[\s\S]*\}/);
+    if (!jsonMatch) {
+      throw new Error('No JSON found in response');
+    }
+    parsed = JSON.parse(jsonMatch[0]);
+  } catch (error) {
+    console.error(`Failed to parse JSON for ${docId}:`, content.text.slice(0, 500));
+    throw new Error(`JSON parse error: ${error}`);
+  }
+
+  // Validate against schema
+  const result = DocumentAnalysisSchema.parse(parsed);
+  
+  return result;
+}
+
+// ============================================================================
+// DEDUPLICATION
+// ============================================================================
+
+const DEDUP_SYSTEM_PROMPT = `You are an expert at identifying when different name variations refer to the same entity. Given a list of entity names, group them by the actual entity they refer to.
+
+Consider:
+- Name variations (J. Smith, John Smith, John Q. Smith)
+- Nicknames and aliases
+- Organizational name variations (LLC vs Inc)
+- Typos and OCR errors
+
+Be conservative - only merge entities when you're confident they're the same.`;
+
+const DEDUP_USER_PROMPT = (entities: string[]) => `Group these entity names by the actual entity they refer to. Return a JSON object where keys are canonical names and values are arrays of aliases.
+
+Entities:
+${entities.map((e) => `- ${e}`).join('\n')}
+
+Return JSON like:
+{
+  "Jeffrey Epstein": ["J. Epstein", "Epstein", "Jeffrey E. Epstein"],
+  "Ghislaine Maxwell": ["G. Maxwell", "Maxwell"]
+}
+
+Return ONLY valid JSON.`;
+
+export async function deduplicateEntities(
+  entities: string[]
+): Promise<Record<string, string[]>> {
+  const response = await anthropic.messages.create({
+    model: config.LLM_MODEL,
+    max_tokens: 4096,
+    system: DEDUP_SYSTEM_PROMPT,
+    messages: [
+      {
+        role: 'user',
+        content: DEDUP_USER_PROMPT(entities),
+      },
+    ],
+  });
+
+  const content = response.content[0];
+  if (content.type !== 'text') {
+    throw new Error(`Unexpected response type: ${content.type}`);
+  }
+
+  const jsonMatch = content.text.match(/\{[\s\S]*\}/);
+  if (!jsonMatch) {
+    throw new Error('No JSON found in dedup response');
+  }
+
+  return JSON.parse(jsonMatch[0]);
+}
--- a/extraction/src/scripts/extract-documents.ts
+++ b/extraction/src/scripts/extract-documents.ts
@@ -0,0 +1,135 @@
+/**
+ * Document Extraction Script
+ * 
+ * Reads OCR text from the data sources and loads it into PostgreSQL.
+ * This is the first step in the pipeline.
+ */
+
+import fs from 'fs';
+import path from 'path';
+import readline from 'readline';
+import { config } from '../config.js';
+import { insertDocument, close } from '../db.js';
+
+// Path to the combined text file
+const DATA_DIR = path.resolve(config.DATA_DIR);
+const COMBINED_TEXT_PATH = path.join(DATA_DIR, 'combined-all-epstein-files/COMBINED_ALL_EPSTEIN_FILES_djvu.txt');
+
+// Document ID pattern: EFTA00000001
+const DOC_ID_PATTERN = /^EFTA\d{8}$/;
+
+interface DocumentChunk {
+  docId: string;
+  lines: string[];
+}
+
+async function* readDocuments(): AsyncGenerator<DocumentChunk> {
+  const fileStream = fs.createReadStream(COMBINED_TEXT_PATH);
+  const rl = readline.createInterface({
+    input: fileStream,
+    crlfDelay: Infinity,
+  });
+
+  let currentDoc: DocumentChunk | null = null;
+
+  for await (const line of rl) {
+    const trimmed = line.trim();
+    
+    // Check if this is a new document ID
+    if (DOC_ID_PATTERN.test(trimmed)) {
+      // If we have a previous document, yield it
+      if (currentDoc && currentDoc.lines.length > 0) {
+        yield currentDoc;
+      }
+      
+      // Start a new document
+      currentDoc = {
+        docId: trimmed,
+        lines: [],
+      };
+    } else if (currentDoc) {
+      // Add line to current document
+      if (trimmed.length > 0) {
+        currentDoc.lines.push(line);
+      }
+    }
+  }
+
+  // Yield the last document
+  if (currentDoc && currentDoc.lines.length > 0) {
+    yield currentDoc;
+  }
+}
+
+function getDatasetId(docId: string): number {
+  // Extract the numeric portion
+  const num = parseInt(docId.replace('EFTA', ''), 10);
+  
+  // Map to dataset based on the metadata:
+  // DataSet 1: EFTA00000001-00003158
+  // DataSet 2: EFTA00003159-00003857
+  // DataSet 3: EFTA00003858-00005586
+  // DataSet 4: EFTA00005705-00008320
+  // DataSet 5: EFTA00008409-00008528
+  
+  if (num <= 3158) return 1;
+  if (num <= 3857) return 2;
+  if (num <= 5586) return 3;
+  if (num <= 8320) return 4;
+  return 5;
+}
+
+async function main() {
+  console.log('📄 Starting document extraction...');
+  console.log(`Reading from: ${COMBINED_TEXT_PATH}`);
+  
+  // Check if file exists
+  if (!fs.existsSync(COMBINED_TEXT_PATH)) {
+    console.error(`❌ File not found: ${COMBINED_TEXT_PATH}`);
+    console.error('Make sure the DataSources directory is properly set up.');
+    process.exit(1);
+  }
+
+  let count = 0;
+  let errors = 0;
+  const seenDocs = new Set<string>();
+
+  for await (const doc of readDocuments()) {
+    // Skip duplicate doc IDs (the OCR sometimes repeats)
+    if (seenDocs.has(doc.docId)) {
+      continue;
+    }
+    seenDocs.add(doc.docId);
+
+    try {
+      const fullText = doc.lines.join('\n');
+      const datasetId = getDatasetId(doc.docId);
+      
+      await insertDocument({
+        docId: doc.docId,
+        datasetId,
+        fullText,
+        pageCount: 1, // We'll update this later with actual page counts
+      });
+
+      count++;
+      if (count % 100 === 0) {
+        console.log(`  ✓ Processed ${count} documents...`);
+      }
+    } catch (error) {
+      console.error(`❌ Error processing ${doc.docId}:`, error);
+      errors++;
+    }
+  }
+
+  console.log(`\n✅ Document extraction complete!`);
+  console.log(`   Total documents: ${count}`);
+  console.log(`   Errors: ${errors}`);
+
+  await close();
+}
+
+main().catch((error) => {
+  console.error('Fatal error:', error);
+  process.exit(1);
+});
--- a/extraction/src/scripts/extract-entities.ts
+++ b/extraction/src/scripts/extract-entities.ts
@@ -0,0 +1,198 @@
+/**
+ * Entity Extraction Script
+ * 
+ * Processes documents through the LLM to extract entities and relationships.
+ * Uses rate limiting and batching for efficiency.
+ */
+
+import pLimit from 'p-limit';
+import { config } from '../config.js';
+import {
+  getDocumentsPendingAnalysis,
+  updateDocumentAnalysis,
+  upsertEntity,
+  linkEntityToDocument,
+  insertTriple,
+  pool,
+  close,
+} from '../db.js';
+import { extractFromDocument, type Entity, type Triple } from '../ner/extractor.js';
+
+// Rate limiter
+const limit = pLimit(config.MAX_WORKERS);
+
+// Track progress
+let processed = 0;
+let errors = 0;
+let totalEntities = 0;
+let totalTriples = 0;
+
+async function processDocument(doc: {
+  id: number;
+  docId: string;
+  fullText: string;
+}): Promise<void> {
+  try {
+    console.log(`  📝 Processing ${doc.docId}...`);
+
+    // Mark as processing
+    await pool.query(
+      `UPDATE documents SET analysis_status = 'processing' WHERE id = $1`,
+      [doc.id]
+    );
+
+    // Extract entities and relationships
+    const analysis = await extractFromDocument(doc.docId, doc.fullText);
+
+    // Parse dates
+    const dateEarliest = analysis.dateEarliest
+      ? new Date(analysis.dateEarliest)
+      : undefined;
+    const dateLatest = analysis.dateLatest
+      ? new Date(analysis.dateLatest)
+      : undefined;
+
+    // Update document analysis
+    await updateDocumentAnalysis(doc.docId, {
+      summary: analysis.summary,
+      detailedSummary: analysis.detailedSummary,
+      documentType: analysis.documentType,
+      dateEarliest,
+      dateLatest,
+      contentTags: analysis.contentTags,
+    });
+
+    // Insert entities and get their IDs
+    const entityIdMap = new Map<string, number>();
+    
+    for (const entity of analysis.entities) {
+      const entityId = await upsertEntity({
+        canonicalName: entity.name,
+        entityType: entity.type,
+      });
+      entityIdMap.set(entity.name.toLowerCase(), entityId);
+
+      // Link entity to document
+      await linkEntityToDocument(entityId, doc.id, 1, entity.context);
+    }
+
+    totalEntities += analysis.entities.length;
+
+    // Insert triples
+    for (let i = 0; i < analysis.triples.length; i++) {
+      const triple = analysis.triples[i];
+      
+      // Get or create subject entity
+      let subjectId = entityIdMap.get(triple.subject.toLowerCase());
+      if (!subjectId) {
+        subjectId = await upsertEntity({
+          canonicalName: triple.subject,
+          entityType: triple.subjectType,
+        });
+        entityIdMap.set(triple.subject.toLowerCase(), subjectId);
+      }
+
+      // Get or create object entity
+      let objectId = entityIdMap.get(triple.object.toLowerCase());
+      if (!objectId) {
+        objectId = await upsertEntity({
+          canonicalName: triple.object,
+          entityType: triple.objectType,
+        });
+        entityIdMap.set(triple.object.toLowerCase(), objectId);
+      }
+
+      // Get location entity if present
+      let locationId: number | undefined;
+      if (triple.location) {
+        locationId = entityIdMap.get(triple.location.toLowerCase());
+        if (!locationId) {
+          locationId = await upsertEntity({
+            canonicalName: triple.location,
+            entityType: 'location',
+          });
+          entityIdMap.set(triple.location.toLowerCase(), locationId);
+        }
+      }
+
+      // Parse timestamp
+      const timestamp = triple.timestamp ? new Date(triple.timestamp) : undefined;
+
+      // Insert triple
+      await insertTriple({
+        documentId: doc.id,
+        subjectId,
+        predicate: triple.predicate,
+        objectId,
+        locationId,
+        timestamp,
+        explicitTopic: triple.explicitTopic,
+        implicitTopic: triple.implicitTopic,
+        tags: triple.tags,
+        sequenceOrder: i,
+      });
+    }
+
+    totalTriples += analysis.triples.length;
+    processed++;
+
+    console.log(
+      `  ✓ ${doc.docId}: ${analysis.entities.length} entities, ${analysis.triples.length} triples`
+    );
+  } catch (error) {
+    errors++;
+    console.error(`  ❌ ${doc.docId}: ${error}`);
+    
+    // Mark as failed
+    await pool.query(
+      `UPDATE documents SET 
+         analysis_status = 'failed',
+         error_message = $2,
+         updated_at = NOW()
+       WHERE id = $1`,
+      [doc.id, String(error)]
+    );
+  }
+}
+
+async function main() {
+  console.log('🔍 Starting entity extraction...');
+  console.log(`   Model: ${config.LLM_MODEL}`);
+  console.log(`   Workers: ${config.MAX_WORKERS}`);
+  console.log(`   Batch size: ${config.BATCH_SIZE}\n`);
+
+  let hasMore = true;
+
+  while (hasMore) {
+    // Get batch of pending documents
+    const documents = await getDocumentsPendingAnalysis(config.BATCH_SIZE);
+    
+    if (documents.length === 0) {
+      hasMore = false;
+      break;
+    }
+
+    console.log(`\n📦 Processing batch of ${documents.length} documents...`);
+
+    // Process in parallel with rate limiting
+    await Promise.all(
+      documents.map((doc) => limit(() => processDocument(doc)))
+    );
+
+    // Brief pause between batches
+    await new Promise((resolve) => setTimeout(resolve, 1000));
+  }
+
+  console.log(`\n✅ Entity extraction complete!`);
+  console.log(`   Documents processed: ${processed}`);
+  console.log(`   Entities extracted: ${totalEntities}`);
+  console.log(`   Triples extracted: ${totalTriples}`);
+  console.log(`   Errors: ${errors}`);
+
+  await close();
+}
+
+main().catch((error) => {
+  console.error('Fatal error:', error);
+  process.exit(1);
+});
--- a/extraction/src/scripts/match-crossref.ts
+++ b/extraction/src/scripts/match-crossref.ts
@@ -0,0 +1,236 @@
+/**
+ * Cross-Reference Matching Script
+ * 
+ * Matches extracted entities against PPP loans, FEC contributions, and federal grants.
+ * Uses fuzzy matching with configurable thresholds.
+ */
+
+import { pool, close } from '../db.js';
+
+// Similarity threshold for matches (0-1)
+const MATCH_THRESHOLD = 0.7;
+
+interface Match {
+  entityId: number;
+  entityName: string;
+  source: 'ppp' | 'fec' | 'grants';
+  sourceId: number;
+  sourceName: string;
+  score: number;
+}
+
+async function findPPPMatches(): Promise<Match[]> {
+  console.log('🔍 Matching entities against PPP loans...');
+  
+  const result = await pool.query(`
+    SELECT 
+      e.id AS entity_id,
+      e.canonical_name AS entity_name,
+      p.id AS source_id,
+      p.borrower_name AS source_name,
+      similarity(e.canonical_name, p.borrower_name) AS score
+    FROM entities e
+    CROSS JOIN LATERAL (
+      SELECT id, borrower_name
+      FROM ppp_loans
+      WHERE 
+        borrower_name % e.canonical_name
+        AND similarity(borrower_name, e.canonical_name) >= $1
+      ORDER BY similarity(borrower_name, e.canonical_name) DESC
+      LIMIT 5
+    ) p
+    WHERE e.entity_type IN ('person', 'organization')
+  `, [MATCH_THRESHOLD]);
+
+  return result.rows.map((row) => ({
+    entityId: row.entity_id,
+    entityName: row.entity_name,
+    source: 'ppp' as const,
+    sourceId: row.source_id,
+    sourceName: row.source_name,
+    score: row.score,
+  }));
+}
+
+async function findFECMatches(): Promise<Match[]> {
+  console.log('🔍 Matching entities against FEC contributions...');
+  
+  const result = await pool.query(`
+    SELECT 
+      e.id AS entity_id,
+      e.canonical_name AS entity_name,
+      f.id AS source_id,
+      f.contributor_name AS source_name,
+      similarity(e.canonical_name, f.contributor_name) AS score
+    FROM entities e
+    CROSS JOIN LATERAL (
+      SELECT id, contributor_name
+      FROM fec_contributions
+      WHERE 
+        contributor_name % e.canonical_name
+        AND similarity(contributor_name, e.canonical_name) >= $1
+      ORDER BY similarity(contributor_name, e.canonical_name) DESC
+      LIMIT 5
+    ) f
+    WHERE e.entity_type = 'person'
+  `, [MATCH_THRESHOLD]);
+
+  return result.rows.map((row) => ({
+    entityId: row.entity_id,
+    entityName: row.entity_name,
+    source: 'fec' as const,
+    sourceId: row.source_id,
+    sourceName: row.source_name,
+    score: row.score,
+  }));
+}
+
+async function findGrantsMatches(): Promise<Match[]> {
+  console.log('🔍 Matching entities against federal grants...');
+  
+  const result = await pool.query(`
+    SELECT 
+      e.id AS entity_id,
+      e.canonical_name AS entity_name,
+      g.id AS source_id,
+      g.recipient_name AS source_name,
+      similarity(e.canonical_name, g.recipient_name) AS score
+    FROM entities e
+    CROSS JOIN LATERAL (
+      SELECT id, recipient_name
+      FROM federal_grants
+      WHERE 
+        recipient_name % e.canonical_name
+        AND similarity(recipient_name, e.canonical_name) >= $1
+      ORDER BY similarity(recipient_name, e.canonical_name) DESC
+      LIMIT 5
+    ) g
+    WHERE e.entity_type IN ('person', 'organization')
+  `, [MATCH_THRESHOLD]);
+
+  return result.rows.map((row) => ({
+    entityId: row.entity_id,
+    entityName: row.entity_name,
+    source: 'grants' as const,
+    sourceId: row.source_id,
+    sourceName: row.source_name,
+    score: row.score,
+  }));
+}
+
+async function saveMatches(matches: Match[]): Promise<void> {
+  if (matches.length === 0) return;
+
+  const values = matches.map((m) => 
+    `(${m.entityId}, '${m.source}', ${m.sourceId}, ${m.score}, 'fuzzy')`
+  ).join(',\n');
+
+  await pool.query(`
+    INSERT INTO entity_crossref_matches (entity_id, source, source_id, match_score, match_method)
+    VALUES ${values}
+    ON CONFLICT DO NOTHING
+  `);
+}
+
+async function updateEntityCrossRefSummary(): Promise<void> {
+  console.log('📊 Updating entity cross-reference summaries...');
+
+  // Update PPP matches
+  await pool.query(`
+    UPDATE entities e
+    SET ppp_matches = (
+      SELECT jsonb_agg(jsonb_build_object(
+        'id', p.id,
+        'borrower', p.borrower_name,
+        'amount', p.loan_amount,
+        'score', m.match_score
+      ))
+      FROM entity_crossref_matches m
+      JOIN ppp_loans p ON m.source_id = p.id
+      WHERE m.entity_id = e.id AND m.source = 'ppp' AND NOT m.false_positive
+    )
+    WHERE EXISTS (
+      SELECT 1 FROM entity_crossref_matches m
+      WHERE m.entity_id = e.id AND m.source = 'ppp'
+    )
+  `);
+
+  // Update FEC matches
+  await pool.query(`
+    UPDATE entities e
+    SET fec_matches = (
+      SELECT jsonb_agg(jsonb_build_object(
+        'id', f.id,
+        'contributor', f.contributor_name,
+        'candidate', f.candidate_name,
+        'amount', f.amount,
+        'score', m.match_score
+      ))
+      FROM entity_crossref_matches m
+      JOIN fec_contributions f ON m.source_id = f.id
+      WHERE m.entity_id = e.id AND m.source = 'fec' AND NOT m.false_positive
+    )
+    WHERE EXISTS (
+      SELECT 1 FROM entity_crossref_matches m
+      WHERE m.entity_id = e.id AND m.source = 'fec'
+    )
+  `);
+
+  // Update grants matches
+  await pool.query(`
+    UPDATE entities e
+    SET grants_matches = (
+      SELECT jsonb_agg(jsonb_build_object(
+        'id', g.id,
+        'recipient', g.recipient_name,
+        'agency', g.awarding_agency,
+        'amount', g.award_amount,
+        'score', m.match_score
+      ))
+      FROM entity_crossref_matches m
+      JOIN federal_grants g ON m.source_id = g.id
+      WHERE m.entity_id = e.id AND m.source = 'grants' AND NOT m.false_positive
+    )
+    WHERE EXISTS (
+      SELECT 1 FROM entity_crossref_matches m
+      WHERE m.entity_id = e.id AND m.source = 'grants'
+    )
+  `);
+}
+
+async function main() {
+  console.log('🔗 Starting cross-reference matching...\n');
+
+  // Find all matches
+  const pppMatches = await findPPPMatches();
+  console.log(`   Found ${pppMatches.length} PPP matches`);
+  
+  const fecMatches = await findFECMatches();
+  console.log(`   Found ${fecMatches.length} FEC matches`);
+  
+  const grantsMatches = await findGrantsMatches();
+  console.log(`   Found ${grantsMatches.length} grants matches`);
+
+  // Save matches
+  console.log('\n💾 Saving matches to database...');
+  await saveMatches(pppMatches);
+  await saveMatches(fecMatches);
+  await saveMatches(grantsMatches);
+
+  // Update entity summaries
+  await updateEntityCrossRefSummary();
+
+  const totalMatches = pppMatches.length + fecMatches.length + grantsMatches.length;
+  console.log(`\n✅ Cross-reference matching complete!`);
+  console.log(`   Total matches: ${totalMatches}`);
+  console.log(`   PPP: ${pppMatches.length}`);
+  console.log(`   FEC: ${fecMatches.length}`);
+  console.log(`   Grants: ${grantsMatches.length}`);
+
+  await close();
+}
+
+main().catch((error) => {
+  console.error('Fatal error:', error);
+  process.exit(1);
+});
--- a/extraction/tsconfig.json
+++ b/extraction/tsconfig.json
@@ -0,0 +1,20 @@
+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "NodeNext",
+    "moduleResolution": "NodeNext",
+    "lib": ["ES2022"],
+    "outDir": "./dist",
+    "rootDir": "./src",
+    "strict": true,
+    "esModuleInterop": true,
+    "skipLibCheck": true,
+    "forceConsistentCasingInFileNames": true,
+    "resolveJsonModule": true,
+    "declaration": true,
+    "declarationMap": true,
+    "sourceMap": true
+  },
+  "include": ["src/**/*"],
+  "exclude": ["node_modules", "dist"]
+}
--- a/frontend/index.html
+++ b/frontend/index.html
@@ -0,0 +1,14 @@
+<!DOCTYPE html>
+<html lang="en" class="dark">
+  <head>
+    <meta charset="UTF-8" />
+    <link rel="icon" type="image/svg+xml" href="/favicon.svg" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <meta name="description" content="Searchable database and network analysis tool for the DOJ Epstein Files release" />
+    <title>Epstein Files Database</title>
+  </head>
+  <body class="bg-background text-white antialiased">
+    <div id="root"></div>
+    <script type="module" src="/src/main.tsx"></script>
+  </body>
+</html>
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -0,0 +1,38 @@
+{
+  "name": "@epstein-db/frontend",
+  "version": "1.0.0",
+  "private": true,
+  "type": "module",
+  "scripts": {
+    "dev": "vite",
+    "build": "tsc && vite build",
+    "preview": "vite preview",
+    "lint": "eslint . --ext ts,tsx --report-unused-disable-directives --max-warnings 0"
+  },
+  "dependencies": {
+    "@tanstack/react-query": "^5.32.0",
+    "clsx": "^2.1.0",
+    "d3": "^7.9.0",
+    "lucide-react": "^0.372.0",
+    "react": "^18.2.0",
+    "react-dom": "^18.2.0",
+    "react-force-graph-2d": "^1.25.5",
+    "react-router-dom": "^6.22.0",
+    "tailwind-merge": "^2.2.2"
+  },
+  "devDependencies": {
+    "@types/d3": "^7.4.3",
+    "@types/node": "^20.12.0",
+    "@types/react": "^18.2.79",
+    "@types/react-dom": "^18.2.25",
+    "@vitejs/plugin-react": "^4.2.0",
+    "autoprefixer": "^10.4.19",
+    "eslint": "^8.57.0",
+    "eslint-plugin-react-hooks": "^4.6.0",
+    "eslint-plugin-react-refresh": "^0.4.6",
+    "postcss": "^8.4.38",
+    "tailwindcss": "^3.4.3",
+    "typescript": "^5.4.0",
+    "vite": "^5.2.0"
+  }
+}
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@@ -0,0 +1,29 @@
+import { Routes, Route } from 'react-router-dom'
+import { Layout } from './components/Layout'
+import { HomePage } from './pages/HomePage'
+import { NetworkPage } from './pages/NetworkPage'
+import { EntitiesPage } from './pages/EntitiesPage'
+import { EntityDetailPage } from './pages/EntityDetailPage'
+import { DocumentsPage } from './pages/DocumentsPage'
+import { DocumentDetailPage } from './pages/DocumentDetailPage'
+import { SearchPage } from './pages/SearchPage'
+import { PatternsPage } from './pages/PatternsPage'
+import { CrossRefPage } from './pages/CrossRefPage'
+
+export default function App() {
+  return (
+    <Layout>
+      <Routes>
+        <Route path="/" element={<HomePage />} />
+        <Route path="/network" element={<NetworkPage />} />
+        <Route path="/entities" element={<EntitiesPage />} />
+        <Route path="/entities/:id" element={<EntityDetailPage />} />
+        <Route path="/documents" element={<DocumentsPage />} />
+        <Route path="/documents/:id" element={<DocumentDetailPage />} />
+        <Route path="/search" element={<SearchPage />} />
+        <Route path="/patterns" element={<PatternsPage />} />
+        <Route path="/crossref" element={<CrossRefPage />} />
+      </Routes>
+    </Layout>
+  )
+}
--- a/frontend/src/api/index.ts
+++ b/frontend/src/api/index.ts
@@ -0,0 +1,277 @@
+const API_BASE = '/api'
+
+export interface Stats {
+  documents: number
+  entities: number
+  triples: number
+  pppLoans: number
+  fecRecords: number
+  grants: number
+  patterns: number
+}
+
+export interface Entity {
+  id: number
+  canonicalName: string
+  entityType: string
+  layer: number | null
+  description?: string
+  documentCount: number
+  connectionCount: number
+  aliases?: string[]
+  pppMatches?: any[]
+  fecMatches?: any[]
+  grantsMatches?: any[]
+}
+
+export interface Document {
+  id: number
+  docId: string
+  datasetId: number
+  documentType?: string
+  summary?: string
+  detailedSummary?: string
+  dateEarliest?: string
+  dateLatest?: string
+  contentTags?: string[]
+  pageCount?: number
+}
+
+export interface Connection {
+  id: number
+  canonicalName: string
+  entityType: string
+  layer: number | null
+  sharedDocs: number
+}
+
+export interface NetworkData {
+  nodes: Array<{
+    id: number
+    canonicalName: string
+    entityType: string
+    layer: number | null
+    documentCount: number
+    connectionCount: number
+  }>
+  edges: Array<{
+    source: number
+    target: number
+    weight: number
+  }>
+  stats: {
+    nodeCount: number
+    edgeCount: number
+  }
+}
+
+export interface Pattern {
+  id: number
+  title: string
+  description: string
+  patternType: string
+  confidence: number | null
+  status: string
+  discoveredAt: string
+}
+
+export interface SearchResult {
+  id: number
+  docId: string
+  documentType?: string
+  summary?: string
+  rank: number
+  snippet?: string
+}
+
+// Stats
+export async function getStats(): Promise<Stats> {
+  const res = await fetch(`${API_BASE}/stats`)
+  if (!res.ok) throw new Error('Failed to fetch stats')
+  return res.json()
+}
+
+// Entities
+export async function searchEntities(params: {
+  q?: string
+  type?: string
+  layer?: string
+  limit?: number
+}): Promise<{ entities: Entity[]; count: number }> {
+  const searchParams = new URLSearchParams()
+  if (params.q) searchParams.set('q', params.q)
+  if (params.type) searchParams.set('type', params.type)
+  if (params.layer) searchParams.set('layer', params.layer)
+  if (params.limit) searchParams.set('limit', params.limit.toString())
+
+  const res = await fetch(`${API_BASE}/entities?${searchParams}`)
+  if (!res.ok) throw new Error('Failed to search entities')
+  return res.json()
+}
+
+export async function getEntity(id: number): Promise<Entity> {
+  const res = await fetch(`${API_BASE}/entities/${id}`)
+  if (!res.ok) throw new Error('Failed to fetch entity')
+  return res.json()
+}
+
+export async function getEntityConnections(
+  id: number,
+  limit?: number
+): Promise<{ connections: Connection[]; count: number }> {
+  const params = limit ? `?limit=${limit}` : ''
+  const res = await fetch(`${API_BASE}/entities/${id}/connections${params}`)
+  if (!res.ok) throw new Error('Failed to fetch connections')
+  return res.json()
+}
+
+export async function getEntityDocuments(
+  id: number,
+  limit?: number
+): Promise<{ documents: Document[]; count: number }> {
+  const params = limit ? `?limit=${limit}` : ''
+  const res = await fetch(`${API_BASE}/entities/${id}/documents${params}`)
+  if (!res.ok) throw new Error('Failed to fetch documents')
+  return res.json()
+}
+
+// Documents
+export async function listDocuments(params: {
+  type?: string
+  dataset?: string
+  limit?: number
+  offset?: number
+}): Promise<{ documents: Document[]; count: number; offset: number; limit: number }> {
+  const searchParams = new URLSearchParams()
+  if (params.type) searchParams.set('type', params.type)
+  if (params.dataset) searchParams.set('dataset', params.dataset)
+  if (params.limit) searchParams.set('limit', params.limit.toString())
+  if (params.offset) searchParams.set('offset', params.offset.toString())
+
+  const res = await fetch(`${API_BASE}/documents?${searchParams}`)
+  if (!res.ok) throw new Error('Failed to list documents')
+  return res.json()
+}
+
+export async function getDocument(id: number): Promise<Document> {
+  const res = await fetch(`${API_BASE}/documents/${id}`)
+  if (!res.ok) throw new Error('Failed to fetch document')
+  return res.json()
+}
+
+export async function getDocumentText(id: number): Promise<{ id: number; text: string }> {
+  const res = await fetch(`${API_BASE}/documents/${id}/text`)
+  if (!res.ok) throw new Error('Failed to fetch document text')
+  return res.json()
+}
+
+export async function getDocumentEntities(
+  id: number
+): Promise<{ entities: Array<Entity & { mentionCount: number }>; count: number }> {
+  const res = await fetch(`${API_BASE}/documents/${id}/entities`)
+  if (!res.ok) throw new Error('Failed to fetch document entities')
+  return res.json()
+}
+
+// Network
+export async function getNetwork(params?: {
+  limit?: number
+  minConnections?: number
+}): Promise<NetworkData> {
+  const searchParams = new URLSearchParams()
+  if (params?.limit) searchParams.set('limit', params.limit.toString())
+  if (params?.minConnections) searchParams.set('minConnections', params.minConnections.toString())
+
+  const res = await fetch(`${API_BASE}/network?${searchParams}`)
+  if (!res.ok) throw new Error('Failed to fetch network')
+  return res.json()
+}
+
+export async function getNetworkByLayer(): Promise<{
+  layers: Array<{
+    layer: number
+    entities: Entity[]
+    count: number
+  }>
+}> {
+  const res = await fetch(`${API_BASE}/network/layers`)
+  if (!res.ok) throw new Error('Failed to fetch network layers')
+  return res.json()
+}
+
+// Patterns
+export async function listPatterns(params?: {
+  status?: string
+  type?: string
+}): Promise<{ patterns: Pattern[]; count: number }> {
+  const searchParams = new URLSearchParams()
+  if (params?.status) searchParams.set('status', params.status)
+  if (params?.type) searchParams.set('type', params.type)
+
+  const res = await fetch(`${API_BASE}/patterns?${searchParams}`)
+  if (!res.ok) throw new Error('Failed to list patterns')
+  return res.json()
+}
+
+export async function getPattern(id: number): Promise<{
+  pattern: Pattern & { entityIds: number[]; evidence: any; notes?: string }
+  entities: Entity[]
+}> {
+  const res = await fetch(`${API_BASE}/patterns/${id}`)
+  if (!res.ok) throw new Error('Failed to fetch pattern')
+  return res.json()
+}
+
+// Search
+export async function fullTextSearch(
+  query: string,
+  limit?: number
+): Promise<{ results: SearchResult[]; count: number; query: string }> {
+  const params = new URLSearchParams({ q: query })
+  if (limit) params.set('limit', limit.toString())
+
+  const res = await fetch(`${API_BASE}/search?${params}`)
+  if (!res.ok) throw new Error('Failed to search')
+  return res.json()
+}
+
+// Cross-reference
+export async function searchPPP(
+  query: string,
+  limit?: number
+): Promise<{ results: any[]; count: number }> {
+  const params = new URLSearchParams({ q: query })
+  if (limit) params.set('limit', limit.toString())
+
+  const res = await fetch(`${API_BASE}/crossref/ppp?${params}`)
+  if (!res.ok) throw new Error('Failed to search PPP')
+  return res.json()
+}
+
+export async function searchFEC(
+  query: string,
+  candidate?: string,
+  limit?: number
+): Promise<{ results: any[]; count: number }> {
+  const params = new URLSearchParams({ q: query })
+  if (candidate) params.set('candidate', candidate)
+  if (limit) params.set('limit', limit.toString())
+
+  const res = await fetch(`${API_BASE}/crossref/fec?${params}`)
+  if (!res.ok) throw new Error('Failed to search FEC')
+  return res.json()
+}
+
+export async function searchGrants(
+  query: string,
+  agency?: string,
+  limit?: number
+): Promise<{ results: any[]; count: number }> {
+  const params = new URLSearchParams({ q: query })
+  if (agency) params.set('agency', agency)
+  if (limit) params.set('limit', limit.toString())
+
+  const res = await fetch(`${API_BASE}/crossref/grants?${params}`)
+  if (!res.ok) throw new Error('Failed to search grants')
+  return res.json()
+}
--- a/frontend/src/components/Layout.tsx
+++ b/frontend/src/components/Layout.tsx
@@ -0,0 +1,80 @@
+import { ReactNode } from 'react'
+import { Link, useLocation } from 'react-router-dom'
+import { 
+  Search, 
+  Network, 
+  Users, 
+  FileText, 
+  Lightbulb, 
+  Link2, 
+  Home 
+} from 'lucide-react'
+import { clsx } from 'clsx'
+
+interface LayoutProps {
+  children: ReactNode
+}
+
+const navItems = [
+  { path: '/', icon: Home, label: 'Home' },
+  { path: '/network', icon: Network, label: 'Network' },
+  { path: '/entities', icon: Users, label: 'Entities' },
+  { path: '/documents', icon: FileText, label: 'Documents' },
+  { path: '/search', icon: Search, label: 'Search' },
+  { path: '/patterns', icon: Lightbulb, label: 'Patterns' },
+  { path: '/crossref', icon: Link2, label: 'Cross-Ref' },
+]
+
+export function Layout({ children }: LayoutProps) {
+  const location = useLocation()
+
+  return (
+    <div className="min-h-screen flex">
+      {/* Sidebar */}
+      <nav className="w-64 bg-surface border-r border-border flex flex-col">
+        {/* Logo */}
+        <div className="p-4 border-b border-border">
+          <Link to="/" className="flex items-center gap-2">
+            <div className="w-8 h-8 bg-red-600 rounded-lg flex items-center justify-center">
+              <span className="text-white font-bold text-sm">EF</span>
+            </div>
+            <div>
+              <h1 className="font-semibold text-white">Epstein Files</h1>
+              <p className="text-xs text-gray-500">Database</p>
+            </div>
+          </Link>
+        </div>
+
+        {/* Navigation */}
+        <div className="flex-1 p-2">
+          {navItems.map(({ path, icon: Icon, label }) => (
+            <Link
+              key={path}
+              to={path}
+              className={clsx(
+                'flex items-center gap-3 px-3 py-2 rounded-lg mb-1 transition-colors',
+                location.pathname === path
+                  ? 'bg-blue-600/20 text-blue-400'
+                  : 'text-gray-400 hover:bg-surface-hover hover:text-gray-200'
+              )}
+            >
+              <Icon size={18} />
+              <span>{label}</span>
+            </Link>
+          ))}
+        </div>
+
+        {/* Footer */}
+        <div className="p-4 border-t border-border text-xs text-gray-500">
+          <p>4,055 documents</p>
+          <p className="mt-1">DOJ Release Dec 2025</p>
+        </div>
+      </nav>
+
+      {/* Main Content */}
+      <main className="flex-1 overflow-auto">
+        {children}
+      </main>
+    </div>
+  )
+}
--- a/frontend/src/index.css
+++ b/frontend/src/index.css
@@ -0,0 +1,88 @@
+@tailwind base;
+@tailwind components;
+@tailwind utilities;
+
+@layer base {
+  body {
+    @apply bg-background text-gray-100;
+  }
+  
+  /* Custom scrollbar */
+  ::-webkit-scrollbar {
+    width: 8px;
+    height: 8px;
+  }
+  
+  ::-webkit-scrollbar-track {
+    @apply bg-surface;
+  }
+  
+  ::-webkit-scrollbar-thumb {
+    @apply bg-border rounded-full;
+  }
+  
+  ::-webkit-scrollbar-thumb:hover {
+    @apply bg-gray-600;
+  }
+}
+
+@layer components {
+  .card {
+    @apply bg-surface border border-border rounded-lg;
+  }
+  
+  .btn {
+    @apply px-4 py-2 rounded-lg font-medium transition-colors;
+  }
+  
+  .btn-primary {
+    @apply bg-blue-600 hover:bg-blue-700 text-white;
+  }
+  
+  .btn-secondary {
+    @apply bg-surface border border-border hover:bg-surface-hover text-gray-200;
+  }
+  
+  .input {
+    @apply bg-surface border border-border rounded-lg px-4 py-2 text-white placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:border-transparent;
+  }
+  
+  /* Layer badges */
+  .layer-badge {
+    @apply inline-flex items-center px-2 py-0.5 rounded-full text-xs font-medium;
+  }
+  
+  .layer-0 {
+    @apply bg-red-500/20 text-red-400 border border-red-500/30;
+  }
+  
+  .layer-1 {
+    @apply bg-orange-500/20 text-orange-400 border border-orange-500/30;
+  }
+  
+  .layer-2 {
+    @apply bg-yellow-500/20 text-yellow-400 border border-yellow-500/30;
+  }
+  
+  .layer-3 {
+    @apply bg-green-500/20 text-green-400 border border-green-500/30;
+  }
+  
+  /* Entity type badges */
+  .entity-person {
+    @apply bg-blue-500/20 text-blue-400;
+  }
+  
+  .entity-organization {
+    @apply bg-purple-500/20 text-purple-400;
+  }
+  
+  .entity-location {
+    @apply bg-teal-500/20 text-teal-400;
+  }
+}
+
+/* Force graph styling */
+.force-graph-container {
+  background: #0a0a0a;
+}
--- a/frontend/src/main.tsx
+++ b/frontend/src/main.tsx
@@ -0,0 +1,25 @@
+import React from 'react'
+import ReactDOM from 'react-dom/client'
+import { QueryClient, QueryClientProvider } from '@tanstack/react-query'
+import { BrowserRouter } from 'react-router-dom'
+import App from './App'
+import './index.css'
+
+const queryClient = new QueryClient({
+  defaultOptions: {
+    queries: {
+      staleTime: 1000 * 60 * 5, // 5 minutes
+      retry: 1,
+    },
+  },
+})
+
+ReactDOM.createRoot(document.getElementById('root')!).render(
+  <React.StrictMode>
+    <QueryClientProvider client={queryClient}>
+      <BrowserRouter>
+        <App />
+      </BrowserRouter>
+    </QueryClientProvider>
+  </React.StrictMode>,
+)
--- a/frontend/src/pages/HomePage.tsx
+++ b/frontend/src/pages/HomePage.tsx
@@ -0,0 +1,187 @@
+import { useQuery } from '@tanstack/react-query'
+import { Link } from 'react-router-dom'
+import { getStats, getNetworkByLayer } from '@/api'
+import { Users, FileText, Network, Lightbulb, DollarSign, Vote, Building } from 'lucide-react'
+
+export function HomePage() {
+  const { data: stats, isLoading: statsLoading } = useQuery({
+    queryKey: ['stats'],
+    queryFn: getStats,
+  })
+
+  const { data: layersData, isLoading: layersLoading } = useQuery({
+    queryKey: ['network-layers'],
+    queryFn: getNetworkByLayer,
+  })
+
+  return (
+    <div className="p-6">
+      {/* Header */}
+      <div className="mb-8">
+        <h1 className="text-3xl font-bold text-white mb-2">Epstein Files Database</h1>
+        <p className="text-gray-400">
+          Searchable database and network analysis tool for the DOJ Epstein Files release
+        </p>
+      </div>
+
+      {/* Stats Grid */}
+      <div className="grid grid-cols-2 md:grid-cols-4 gap-4 mb-8">
+        <StatCard
+          icon={FileText}
+          label="Documents"
+          value={stats?.documents ?? 0}
+          loading={statsLoading}
+        />
+        <StatCard
+          icon={Users}
+          label="Entities"
+          value={stats?.entities ?? 0}
+          loading={statsLoading}
+        />
+        <StatCard
+          icon={Network}
+          label="Relationships"
+          value={stats?.triples ?? 0}
+          loading={statsLoading}
+        />
+        <StatCard
+          icon={Lightbulb}
+          label="Patterns"
+          value={stats?.patterns ?? 0}
+          loading={statsLoading}
+        />
+      </div>
+
+      {/* Cross-Reference Stats */}
+      <div className="card p-4 mb-8">
+        <h2 className="text-lg font-semibold mb-4">Cross-Reference Data</h2>
+        <div className="grid grid-cols-3 gap-4">
+          <div className="flex items-center gap-3">
+            <div className="p-2 bg-green-500/20 rounded-lg">
+              <DollarSign className="text-green-400" size={20} />
+            </div>
+            <div>
+              <p className="text-sm text-gray-400">PPP Loans</p>
+              <p className="font-semibold">{stats?.pppLoans?.toLocaleString() ?? '—'}</p>
+            </div>
+          </div>
+          <div className="flex items-center gap-3">
+            <div className="p-2 bg-blue-500/20 rounded-lg">
+              <Vote className="text-blue-400" size={20} />
+            </div>
+            <div>
+              <p className="text-sm text-gray-400">FEC Records</p>
+              <p className="font-semibold">{stats?.fecRecords?.toLocaleString() ?? '—'}</p>
+            </div>
+          </div>
+          <div className="flex items-center gap-3">
+            <div className="p-2 bg-purple-500/20 rounded-lg">
+              <Building className="text-purple-400" size={20} />
+            </div>
+            <div>
+              <p className="text-sm text-gray-400">Federal Grants</p>
+              <p className="font-semibold">{stats?.grants?.toLocaleString() ?? '—'}</p>
+            </div>
+          </div>
+        </div>
+      </div>
+
+      {/* Layer Overview */}
+      <div className="card p-4 mb-8">
+        <h2 className="text-lg font-semibold mb-4">Network Layers</h2>
+        <div className="space-y-4">
+          {[0, 1, 2, 3].map((layer) => {
+            const layerData = layersData?.layers?.find((l) => l.layer === layer)
+            return (
+              <div key={layer} className="flex items-center gap-4">
+                <span className={`layer-badge layer-${layer}`}>L{layer}</span>
+                <div className="flex-1">
+                  <div className="flex justify-between mb-1">
+                    <span className="text-sm text-gray-300">
+                      {layer === 0 && 'Jeffrey Epstein'}
+                      {layer === 1 && 'Direct Associates'}
+                      {layer === 2 && 'One Degree Removed'}
+                      {layer === 3 && 'Two Degrees Removed'}
+                    </span>
+                    <span className="text-sm text-gray-500">
+                      {layerData?.count ?? 0} entities
+                    </span>
+                  </div>
+                  <div className="h-2 bg-surface rounded-full overflow-hidden">
+                    <div
+                      className={`h-full ${
+                        layer === 0 ? 'bg-red-500' :
+                        layer === 1 ? 'bg-orange-500' :
+                        layer === 2 ? 'bg-yellow-500' :
+                        'bg-green-500'
+                      }`}
+                      style={{
+                        width: `${Math.min(100, (layerData?.count ?? 0) / 10)}%`
+                      }}
+                    />
+                  </div>
+                </div>
+              </div>
+            )
+          })}
+        </div>
+      </div>
+
+      {/* Quick Actions */}
+      <div className="grid grid-cols-2 md:grid-cols-3 gap-4">
+        <Link to="/network" className="card p-4 hover:bg-surface-hover transition-colors">
+          <Network className="text-blue-400 mb-2" size={24} />
+          <h3 className="font-semibold mb-1">Explore Network</h3>
+          <p className="text-sm text-gray-400">Interactive visualization of entity connections</p>
+        </Link>
+        <Link to="/search" className="card p-4 hover:bg-surface-hover transition-colors">
+          <FileText className="text-green-400 mb-2" size={24} />
+          <h3 className="font-semibold mb-1">Search Documents</h3>
+          <p className="text-sm text-gray-400">Full-text search across all documents</p>
+        </Link>
+        <Link to="/patterns" className="card p-4 hover:bg-surface-hover transition-colors">
+          <Lightbulb className="text-yellow-400 mb-2" size={24} />
+          <h3 className="font-semibold mb-1">View Patterns</h3>
+          <p className="text-sm text-gray-400">AI-discovered connections and insights</p>
+        </Link>
+      </div>
+
+      {/* Disclaimer */}
+      <div className="mt-8 p-4 bg-yellow-500/10 border border-yellow-500/30 rounded-lg">
+        <h3 className="font-semibold text-yellow-400 mb-1">Disclaimer</h3>
+        <p className="text-sm text-gray-300">
+          This is an independent research tool. It surfaces connections from public documents — 
+          it does not assert guilt, criminality, or wrongdoing. Always verify claims against primary sources.
+        </p>
+      </div>
+    </div>
+  )
+}
+
+function StatCard({
+  icon: Icon,
+  label,
+  value,
+  loading,
+}: {
+  icon: any
+  label: string
+  value: number
+  loading: boolean
+}) {
+  return (
+    <div className="card p-4">
+      <div className="flex items-center gap-3">
+        <div className="p-2 bg-surface-hover rounded-lg">
+          <Icon className="text-gray-400" size={20} />
+        </div>
+        <div>
+          <p className="text-sm text-gray-400">{label}</p>
+          <p className="text-xl font-semibold">
+            {loading ? '—' : value.toLocaleString()}
+          </p>
+        </div>
+      </div>
+    </div>
+  )
+}
--- a/frontend/tailwind.config.js
+++ b/frontend/tailwind.config.js
@@ -0,0 +1,30 @@
+/** @type {import('tailwindcss').Config} */
+export default {
+  content: [
+    "./index.html",
+    "./src/**/*.{js,ts,jsx,tsx}",
+  ],
+  theme: {
+    extend: {
+      colors: {
+        // Dark theme optimized for document analysis
+        background: '#0a0a0a',
+        surface: '#141414',
+        'surface-hover': '#1a1a1a',
+        border: '#262626',
+        
+        // Layer colors
+        'layer-0': '#ef4444', // Epstein - red
+        'layer-1': '#f97316', // Direct - orange
+        'layer-2': '#eab308', // One removed - yellow
+        'layer-3': '#22c55e', // Two removed - green
+        
+        // Entity type colors
+        'entity-person': '#3b82f6',
+        'entity-org': '#8b5cf6',
+        'entity-location': '#14b8a6',
+      },
+    },
+  },
+  plugins: [],
+}
--- a/frontend/vite.config.ts
+++ b/frontend/vite.config.ts
@@ -0,0 +1,21 @@
+import { defineConfig } from 'vite'
+import react from '@vitejs/plugin-react'
+import path from 'path'
+
+export default defineConfig({
+  plugins: [react()],
+  resolve: {
+    alias: {
+      '@': path.resolve(__dirname, './src'),
+    },
+  },
+  server: {
+    port: 3000,
+    proxy: {
+      '/api': {
+        target: 'http://localhost:3001',
+        changeOrigin: true,
+      },
+    },
+  },
+})
--- a/schema/neo4j/constraints.cypher
+++ b/schema/neo4j/constraints.cypher
@@ -0,0 +1,105 @@
+// Neo4j Cypher constraints and initial setup
+// Run these after Neo4j starts
+
+// ============================================================================
+// CONSTRAINTS
+// ============================================================================
+
+// Entity uniqueness
+CREATE CONSTRAINT entity_unique IF NOT EXISTS
+FOR (e:Entity) REQUIRE (e.canonicalName, e.type) IS UNIQUE;
+
+// Document uniqueness
+CREATE CONSTRAINT document_unique IF NOT EXISTS
+FOR (d:Document) REQUIRE d.docId IS UNIQUE;
+
+// ============================================================================
+// INDEXES
+// ============================================================================
+
+// Entity indexes
+CREATE INDEX entity_name IF NOT EXISTS FOR (e:Entity) ON (e.canonicalName);
+CREATE INDEX entity_type IF NOT EXISTS FOR (e:Entity) ON (e.type);
+CREATE INDEX entity_layer IF NOT EXISTS FOR (e:Entity) ON (e.layer);
+
+// Full-text search index on entity names
+CREATE FULLTEXT INDEX entity_search IF NOT EXISTS FOR (e:Entity) ON EACH [e.canonicalName, e.aliases];
+
+// Document indexes
+CREATE INDEX document_docid IF NOT EXISTS FOR (d:Document) ON (d.docId);
+CREATE INDEX document_type IF NOT EXISTS FOR (d:Document) ON (d.documentType);
+
+// ============================================================================
+// ENTITY TYPES (Labels)
+// ============================================================================
+// We use labels for entity types:
+// - :Person
+// - :Organization
+// - :Location
+// - :Entity (base label, all entities have this)
+
+// ============================================================================
+// RELATIONSHIP TYPES
+// ============================================================================
+// - MENTIONED_IN: Entity -> Document (entity appears in document)
+// - CONNECTED_TO: Entity -> Entity (co-occurrence relationship)
+// - HAS_RELATIONSHIP: Entity -> Entity with action property (from triples)
+// - CROSSREF_MATCH: Entity -> CrossRefRecord (PPP, FEC, Grants)
+
+// ============================================================================
+// INITIAL DATA
+// ============================================================================
+
+// Create Jeffrey Epstein as the root node
+MERGE (e:Entity:Person {canonicalName: 'Jeffrey Epstein', type: 'person'})
+SET e.layer = 0,
+    e.description = 'American financier and convicted sex offender',
+    e.aliases = ['Jeffrey E. Epstein', 'J. Epstein', 'Epstein', 'JE'],
+    e.createdAt = datetime();
+
+// ============================================================================
+// HELPER PROCEDURES
+// ============================================================================
+
+// Calculate layer for an entity based on shortest path to Epstein
+// Usage: CALL calculateLayer($entityName) YIELD layer
+// This needs APOC plugin installed
+
+// CALL apoc.custom.asProcedure(
+//   'calculateLayer',
+//   '
+//   MATCH (epstein:Entity {canonicalName: "Jeffrey Epstein"})
+//   MATCH (target:Entity {canonicalName: $entityName})
+//   MATCH path = shortestPath((epstein)-[:CONNECTED_TO*]-(target))
+//   RETURN length(path) AS layer
+//   ',
+//   'read',
+//   [['layer', 'INTEGER']],
+//   [['entityName', 'STRING']]
+// );
+
+// ============================================================================
+// EXAMPLE QUERIES
+// ============================================================================
+
+// Find all Layer 1 entities (direct connections to Epstein)
+// MATCH (epstein:Entity {canonicalName: 'Jeffrey Epstein'})-[:CONNECTED_TO]-(layer1:Entity)
+// RETURN layer1.canonicalName, layer1.type;
+
+// Find shared connections between two entities
+// MATCH (a:Entity {canonicalName: $name1})-[:CONNECTED_TO]-(shared:Entity)-[:CONNECTED_TO]-(b:Entity {canonicalName: $name2})
+// RETURN shared.canonicalName, shared.type;
+
+// Find documents where two entities appear together
+// MATCH (a:Entity {canonicalName: $name1})-[:MENTIONED_IN]->(d:Document)<-[:MENTIONED_IN]-(b:Entity {canonicalName: $name2})
+// RETURN d.docId, d.summary;
+
+// Get entity's network up to N hops
+// MATCH path = (e:Entity {canonicalName: $name})-[:CONNECTED_TO*1..3]-(connected:Entity)
+// RETURN path;
+
+// Find money flows (entities connected through financial documents)
+// MATCH (a:Entity)-[:MENTIONED_IN]->(d:Document {documentType: 'financial'})<-[:MENTIONED_IN]-(b:Entity)
+// WHERE a <> b
+// RETURN a.canonicalName, b.canonicalName, count(d) AS sharedFinancialDocs
+// ORDER BY sharedFinancialDocs DESC;
--- a/schema/postgres/001_initial_schema.sql
+++ b/schema/postgres/001_initial_schema.sql
@@ -0,0 +1,403 @@
+-- Epstein Files Database Schema
+-- PostgreSQL 16+
+
+-- Enable required extensions
+CREATE EXTENSION IF NOT EXISTS pg_trgm;      -- Fuzzy text matching
+CREATE EXTENSION IF NOT EXISTS btree_gin;    -- GIN indexes for JSONB
+CREATE EXTENSION IF NOT EXISTS unaccent;     -- Accent-insensitive search
+
+-- ============================================================================
+-- DOCUMENTS
+-- ============================================================================
+
+CREATE TABLE documents (
+    id              SERIAL PRIMARY KEY,
+    doc_id          TEXT UNIQUE NOT NULL,           -- EFTA00000001
+    dataset_id      INTEGER NOT NULL,               -- Which dataset (1-5)
+    file_path       TEXT,                           -- Original file path
+    
+    -- Content
+    full_text       TEXT,                           -- OCR text
+    page_count      INTEGER,
+    
+    -- AI Analysis
+    summary         TEXT,                           -- One sentence summary
+    detailed_summary TEXT,                          -- Paragraph summary
+    document_type   TEXT,                           -- Deposition, email, financial record, etc.
+    
+    -- Temporal
+    date_earliest   DATE,                           -- Earliest date mentioned
+    date_latest     DATE,                           -- Latest date mentioned
+    
+    -- Metadata
+    content_tags    JSONB DEFAULT '[]',             -- AI-extracted tags
+    analysis_status TEXT DEFAULT 'pending',         -- pending, processing, complete, failed
+    error_message   TEXT,
+    
+    -- Timestamps
+    created_at      TIMESTAMPTZ DEFAULT NOW(),
+    updated_at      TIMESTAMPTZ DEFAULT NOW(),
+    analyzed_at     TIMESTAMPTZ
+);
+
+CREATE INDEX idx_documents_doc_id ON documents(doc_id);
+CREATE INDEX idx_documents_dataset ON documents(dataset_id);
+CREATE INDEX idx_documents_type ON documents(document_type);
+CREATE INDEX idx_documents_status ON documents(analysis_status);
+CREATE INDEX idx_documents_dates ON documents(date_earliest, date_latest);
+CREATE INDEX idx_documents_fulltext ON documents USING gin(to_tsvector('english', full_text));
+CREATE INDEX idx_documents_tags ON documents USING gin(content_tags);
+
+-- ============================================================================
+-- ENTITIES
+-- ============================================================================
+
+-- Entity types enum
+CREATE TYPE entity_type AS ENUM (
+    'person',
+    'organization',
+    'location',
+    'date',
+    'reference',      -- Document references, case numbers, etc.
+    'financial',      -- Dollar amounts, account numbers
+    'unknown'
+);
+
+CREATE TABLE entities (
+    id              SERIAL PRIMARY KEY,
+    canonical_name  TEXT NOT NULL,                  -- Deduplicated canonical form
+    entity_type     entity_type NOT NULL,
+    
+    -- Classification
+    layer           INTEGER,                        -- 0=Epstein, 1=direct, 2=one removed, 3=two removed
+    
+    -- Metadata
+    aliases         JSONB DEFAULT '[]',             -- Alternative spellings/names
+    attributes      JSONB DEFAULT '{}',             -- Type-specific attributes
+    description     TEXT,                           -- AI-generated description
+    
+    -- Cross-reference matches
+    ppp_matches     JSONB DEFAULT '[]',             -- Matched PPP loan records
+    fec_matches     JSONB DEFAULT '[]',             -- Matched FEC contributions
+    grants_matches  JSONB DEFAULT '[]',             -- Matched federal grants
+    
+    -- Stats
+    document_count  INTEGER DEFAULT 0,              -- Number of documents mentioning entity
+    connection_count INTEGER DEFAULT 0,             -- Number of connections to other entities
+    
+    -- Timestamps
+    created_at      TIMESTAMPTZ DEFAULT NOW(),
+    updated_at      TIMESTAMPTZ DEFAULT NOW(),
+    
+    UNIQUE(canonical_name, entity_type)
+);
+
+CREATE INDEX idx_entities_name ON entities(canonical_name);
+CREATE INDEX idx_entities_name_trgm ON entities USING gin(canonical_name gin_trgm_ops);
+CREATE INDEX idx_entities_type ON entities(entity_type);
+CREATE INDEX idx_entities_layer ON entities(layer);
+CREATE INDEX idx_entities_aliases ON entities USING gin(aliases);
+
+-- ============================================================================
+-- ENTITY ALIASES
+-- ============================================================================
+
+CREATE TABLE entity_aliases (
+    id              SERIAL PRIMARY KEY,
+    original_name   TEXT NOT NULL,
+    entity_id       INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
+    confidence      REAL DEFAULT 1.0,               -- Confidence of alias match
+    source          TEXT DEFAULT 'extraction',      -- extraction, llm_dedup, manual
+    reasoning       TEXT,                           -- Why this was matched
+    created_at      TIMESTAMPTZ DEFAULT NOW()
+);
+
+CREATE INDEX idx_aliases_original ON entity_aliases(original_name);
+CREATE INDEX idx_aliases_original_trgm ON entity_aliases USING gin(original_name gin_trgm_ops);
+CREATE INDEX idx_aliases_entity ON entity_aliases(entity_id);
+
+-- ============================================================================
+-- DOCUMENT-ENTITY RELATIONSHIPS
+-- ============================================================================
+
+CREATE TABLE document_entities (
+    id              SERIAL PRIMARY KEY,
+    document_id     INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+    entity_id       INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
+    
+    -- Context
+    mention_count   INTEGER DEFAULT 1,              -- How many times mentioned
+    first_mention   INTEGER,                        -- Character offset of first mention
+    context_snippet TEXT,                           -- Surrounding text
+    
+    -- Metadata
+    extraction_confidence REAL DEFAULT 1.0,
+    
+    UNIQUE(document_id, entity_id)
+);
+
+CREATE INDEX idx_doc_entities_doc ON document_entities(document_id);
+CREATE INDEX idx_doc_entities_entity ON document_entities(entity_id);
+
+-- ============================================================================
+-- RDF TRIPLES (Relationships)
+-- ============================================================================
+
+CREATE TABLE triples (
+    id              SERIAL PRIMARY KEY,
+    document_id     INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+    
+    -- Subject-Predicate-Object
+    subject_id      INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
+    predicate       TEXT NOT NULL,                  -- Action/verb
+    object_id       INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
+    
+    -- Context
+    location_id     INTEGER REFERENCES entities(id) ON DELETE SET NULL,
+    timestamp       DATE,
+    
+    -- Metadata
+    explicit_topic  TEXT,                           -- Stated subject matter
+    implicit_topic  TEXT,                           -- Inferred subject matter
+    tags            JSONB DEFAULT '[]',
+    confidence      REAL DEFAULT 1.0,
+    sequence_order  INTEGER,                        -- Order within document
+    
+    created_at      TIMESTAMPTZ DEFAULT NOW()
+);
+
+CREATE INDEX idx_triples_document ON triples(document_id);
+CREATE INDEX idx_triples_subject ON triples(subject_id);
+CREATE INDEX idx_triples_object ON triples(object_id);
+CREATE INDEX idx_triples_predicate ON triples(predicate);
+CREATE INDEX idx_triples_timestamp ON triples(timestamp);
+CREATE INDEX idx_triples_tags ON triples USING gin(tags);
+
+-- ============================================================================
+-- CROSS-REFERENCE TABLES
+-- ============================================================================
+
+-- PPP Loans
+CREATE TABLE ppp_loans (
+    id              SERIAL PRIMARY KEY,
+    loan_number     TEXT UNIQUE,
+    borrower_name   TEXT NOT NULL,
+    borrower_address TEXT,
+    borrower_city   TEXT,
+    borrower_state  TEXT,
+    borrower_zip    TEXT,
+    loan_amount     NUMERIC(15,2),
+    loan_status     TEXT,
+    forgiveness_amount NUMERIC(15,2),
+    lender          TEXT,
+    naics_code      TEXT,
+    business_type   TEXT,
+    jobs_retained   INTEGER,
+    date_approved   DATE,
+    
+    -- Matching metadata
+    normalized_name TEXT,                           -- For fuzzy matching
+    
+    created_at      TIMESTAMPTZ DEFAULT NOW()
+);
+
+CREATE INDEX idx_ppp_name ON ppp_loans(borrower_name);
+CREATE INDEX idx_ppp_name_trgm ON ppp_loans USING gin(borrower_name gin_trgm_ops);
+CREATE INDEX idx_ppp_normalized ON ppp_loans USING gin(normalized_name gin_trgm_ops);
+
+-- FEC Contributions
+CREATE TABLE fec_contributions (
+    id              SERIAL PRIMARY KEY,
+    fec_id          TEXT,
+    contributor_name TEXT NOT NULL,
+    contributor_city TEXT,
+    contributor_state TEXT,
+    contributor_zip TEXT,
+    contributor_employer TEXT,
+    contributor_occupation TEXT,
+    committee_id    TEXT,
+    committee_name  TEXT,
+    candidate_id    TEXT,
+    candidate_name  TEXT,
+    amount          NUMERIC(12,2),
+    contribution_date DATE,
+    contribution_type TEXT,
+    
+    -- Matching metadata
+    normalized_name TEXT,
+    
+    created_at      TIMESTAMPTZ DEFAULT NOW()
+);
+
+CREATE INDEX idx_fec_contributor ON fec_contributions(contributor_name);
+CREATE INDEX idx_fec_contributor_trgm ON fec_contributions USING gin(contributor_name gin_trgm_ops);
+CREATE INDEX idx_fec_normalized ON fec_contributions USING gin(normalized_name gin_trgm_ops);
+CREATE INDEX idx_fec_candidate ON fec_contributions(candidate_name);
+CREATE INDEX idx_fec_committee ON fec_contributions(committee_name);
+
+-- Federal Grants
+CREATE TABLE federal_grants (
+    id              SERIAL PRIMARY KEY,
+    award_id        TEXT,
+    recipient_name  TEXT NOT NULL,
+    recipient_city  TEXT,
+    recipient_state TEXT,
+    recipient_zip   TEXT,
+    awarding_agency TEXT,
+    funding_agency  TEXT,
+    award_amount    NUMERIC(15,2),
+    award_date      DATE,
+    description     TEXT,
+    cfda_number     TEXT,
+    cfda_title      TEXT,
+    
+    -- Matching metadata
+    normalized_name TEXT,
+    
+    created_at      TIMESTAMPTZ DEFAULT NOW()
+);
+
+CREATE INDEX idx_grants_recipient ON federal_grants(recipient_name);
+CREATE INDEX idx_grants_recipient_trgm ON federal_grants USING gin(recipient_name gin_trgm_ops);
+CREATE INDEX idx_grants_normalized ON federal_grants USING gin(normalized_name gin_trgm_ops);
+
+-- ============================================================================
+-- ENTITY CROSS-REFERENCE MATCHES
+-- ============================================================================
+
+CREATE TYPE match_source AS ENUM ('ppp', 'fec', 'grants');
+
+CREATE TABLE entity_crossref_matches (
+    id              SERIAL PRIMARY KEY,
+    entity_id       INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
+    source          match_source NOT NULL,
+    source_id       INTEGER NOT NULL,               -- ID in the source table
+    
+    -- Match quality
+    match_score     REAL NOT NULL,                  -- 0-1 similarity score
+    match_method    TEXT,                           -- exact, fuzzy, soundex, etc.
+    verified        BOOLEAN DEFAULT FALSE,          -- Human-verified match
+    false_positive  BOOLEAN DEFAULT FALSE,          -- Confirmed not a match
+    
+    created_at      TIMESTAMPTZ DEFAULT NOW(),
+    verified_at     TIMESTAMPTZ,
+    verified_by     TEXT
+);
+
+CREATE INDEX idx_crossref_entity ON entity_crossref_matches(entity_id);
+CREATE INDEX idx_crossref_source ON entity_crossref_matches(source, source_id);
+
+-- ============================================================================
+-- PATTERN FINDINGS
+-- ============================================================================
+
+CREATE TABLE pattern_findings (
+    id              SERIAL PRIMARY KEY,
+    
+    -- The pattern
+    title           TEXT NOT NULL,
+    description     TEXT NOT NULL,
+    pattern_type    TEXT,                           -- financial_flow, travel_pattern, organizational_link, etc.
+    
+    -- Involved entities
+    entity_ids      INTEGER[] NOT NULL,
+    
+    -- Evidence
+    evidence        JSONB NOT NULL,                 -- Supporting documents, connections, etc.
+    confidence      REAL,
+    
+    -- Status
+    status          TEXT DEFAULT 'hypothesis',      -- hypothesis, validated, rejected
+    notes           TEXT,
+    
+    -- Timestamps
+    discovered_at   TIMESTAMPTZ DEFAULT NOW(),
+    discovered_by   TEXT DEFAULT 'pattern_agent',
+    validated_at    TIMESTAMPTZ,
+    validated_by    TEXT
+);
+
+CREATE INDEX idx_patterns_type ON pattern_findings(pattern_type);
+CREATE INDEX idx_patterns_status ON pattern_findings(status);
+CREATE INDEX idx_patterns_entities ON pattern_findings USING gin(entity_ids);
+
+-- ============================================================================
+-- VIEWS
+-- ============================================================================
+
+-- Entity connections view
+CREATE VIEW entity_connections AS
+SELECT 
+    e1.id AS entity1_id,
+    e1.canonical_name AS entity1_name,
+    e1.entity_type AS entity1_type,
+    e2.id AS entity2_id,
+    e2.canonical_name AS entity2_name,
+    e2.entity_type AS entity2_type,
+    COUNT(DISTINCT d.id) AS shared_documents,
+    array_agg(DISTINCT d.doc_id) AS document_ids
+FROM document_entities de1
+JOIN document_entities de2 ON de1.document_id = de2.document_id AND de1.entity_id < de2.entity_id
+JOIN entities e1 ON de1.entity_id = e1.id
+JOIN entities e2 ON de2.entity_id = e2.id
+JOIN documents d ON de1.document_id = d.id
+GROUP BY e1.id, e1.canonical_name, e1.entity_type, e2.id, e2.canonical_name, e2.entity_type;
+
+-- ============================================================================
+-- FUNCTIONS
+-- ============================================================================
+
+-- Normalize name for fuzzy matching
+CREATE OR REPLACE FUNCTION normalize_name(name TEXT) RETURNS TEXT AS $$
+BEGIN
+    RETURN lower(
+        regexp_replace(
+            regexp_replace(
+                unaccent(name),
+                '[^a-zA-Z0-9 ]', '', 'g'
+            ),
+            '\s+', ' ', 'g'
+        )
+    );
+END;
+$$ LANGUAGE plpgsql IMMUTABLE;
+
+-- Update entity stats
+CREATE OR REPLACE FUNCTION update_entity_stats() RETURNS TRIGGER AS $$
+BEGIN
+    -- Update document count
+    UPDATE entities e
+    SET document_count = (
+        SELECT COUNT(DISTINCT document_id) 
+        FROM document_entities 
+        WHERE entity_id = e.id
+    ),
+    connection_count = (
+        SELECT COUNT(*) 
+        FROM entity_connections 
+        WHERE entity1_id = e.id OR entity2_id = e.id
+    ),
+    updated_at = NOW()
+    WHERE e.id = COALESCE(NEW.entity_id, OLD.entity_id);
+    
+    RETURN NULL;
+END;
+$$ LANGUAGE plpgsql;
+
+CREATE TRIGGER trigger_update_entity_stats
+AFTER INSERT OR UPDATE OR DELETE ON document_entities
+FOR EACH ROW EXECUTE FUNCTION update_entity_stats();
+
+-- ============================================================================
+-- INITIAL DATA
+-- ============================================================================
+
+-- Insert Jeffrey Epstein as Layer 0
+INSERT INTO entities (canonical_name, entity_type, layer, description, aliases)
+VALUES (
+    'Jeffrey Epstein',
+    'person',
+    0,
+    'American financier and convicted sex offender',
+    '["Jeffrey E. Epstein", "J. Epstein", "Epstein", "JE"]'::jsonb
+) ON CONFLICT DO NOTHING;