Files

Copilot 5b5e57e095 Create CLI tool and SDK for programmatic access (#103 )

* Initial plan

* feat: Implement CLI tool with init, upload, and verify commands

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

* fix: Resolve linting issues and add CLI-specific eslint config

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

* refactor: Address code review feedback - improve validation, memory efficiency, and type safety

Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: PatrickFanella <61631520+PatrickFanella@users.noreply.github.com>

2025-10-31 10:33:58 -05:00

7.0 KiB

Raw Permalink Blame History

Database Indexing Strategy

This document outlines the indexing strategy for the Internet ID database schema to ensure optimal query performance at scale.

Overview

The database schema uses PostgreSQL with Prisma ORM. Indexes are strategically placed to optimize common query patterns while avoiding index bloat.

Index Categories

1. Primary Keys and Unique Constraints

All models have primary keys (@id) that are automatically indexed:

User.id (cuid)
Content.id (cuid)
PlatformBinding.id (cuid)
Verification.id (cuid)
Account.id (cuid)
Session.id (cuid)

Unique constraints (automatically indexed):

User.address, User.email
Content.contentHash
PlatformBinding.[platform, platformId] (composite unique)
Account.[provider, providerAccountId] (composite unique)
Session.sessionToken
VerificationToken.[identifier, token] (composite unique)
VerificationToken.token

2. Foreign Key Indexes

Foreign keys improve JOIN performance and enforce referential integrity:

Content model:

@@index([creatorId]) - Optimizes queries filtering by user (Content.creator relation)
@@index([creatorAddress]) - Optimizes queries by blockchain address

PlatformBinding model:

@@index([contentId]) - Optimizes queries for bindings by content

Verification model:

@@index([contentId]) - Optimizes queries for verifications by content

Account model:

@@index([userId]) - Optimizes queries for accounts by user
@@index([userId, provider]) - Composite index for efficient user + provider lookups

Session model:

@@index([userId]) - Optimizes queries for sessions by user

3. Filter/Sort Indexes

These indexes optimize WHERE clauses and ORDER BY operations:

Time-based sorting (createdAt):

User.@@index([createdAt])
Content.@@index([createdAt])
PlatformBinding.@@index([createdAt])
Verification.@@index([createdAt])

Status filtering:

Verification.@@index([status]) - Filters by verification status (OK, WARN, FAIL)

Session expiration:

Session.@@index([expires]) - Optimizes cleanup queries for expired sessions

4. Lookup Field Indexes

Content lookup:

Verification.@@index([contentHash]) - Fast lookup of verifications by content hash

Platform filtering:

PlatformBinding.@@index([platform]) - Query all bindings for a specific platform (e.g., all YouTube bindings)

Username lookup:

Account.@@index([username]) - Fast lookup by platform username

5. Composite Indexes

Composite indexes optimize queries with multiple filters:

Verification queries:

@@index([contentHash, createdAt]) - Optimizes "get verifications for content X, ordered by time"
@@index([status, createdAt]) - Optimizes "get failed verifications, ordered by time"

Account queries:

@@index([userId, provider]) - Optimizes "get user's account for provider X"

Query Optimization Guidelines

Critical Query Patterns

List content by recency:

prisma.content.findMany({
  orderBy: { createdAt: "desc" },
  include: { bindings: true },
});

Uses: Content.createdAt index
Related binding queries use: PlatformBinding.contentId index

Get content by hash:
```
prisma.content.findUnique({
  where: { contentHash: hash },
});
```
- Uses: Content.contentHash unique constraint (automatically indexed)

List verifications for content:

prisma.verification.findMany({
  where: { contentHash: hash },
  orderBy: { createdAt: "desc" },
});

Uses: Verification.[contentHash, createdAt] composite index

Filter verifications by status:

prisma.verification.findMany({
  where: { status: "FAIL" },
  orderBy: { createdAt: "desc" },
});

Uses: Verification.[status, createdAt] composite index

Get user accounts by provider:
```
prisma.account.findFirst({
  where: { userId, provider },
});
```
- Uses: Account.[userId, provider] composite index

Lookup platform binding:

prisma.platformBinding.upsert({
  where: { platform_platformId: { platform, platformId } },
});

Uses: PlatformBinding.[platform, platformId] unique constraint (automatically indexed)

Performance Recommendations

Always use indexes for:
- WHERE clauses on large tables
- ORDER BY operations
- JOIN operations (foreign keys)
- GROUP BY operations
Avoid:
- Full table scans on tables with > 1000 rows
- Non-indexed WHERE clauses on high-cardinality columns
- Leading wildcards in LIKE queries (e.g., %search)
Monitor and optimize:
- Use EXPLAIN ANALYZE to verify index usage
- Monitor slow query logs in production
- Add partial indexes for specific use cases if needed

Example: Using EXPLAIN ANALYZE

To verify index usage in PostgreSQL:

EXPLAIN ANALYZE SELECT * FROM "Verification"
WHERE "contentHash" = '0x123...'
ORDER BY "createdAt" DESC;

Expected output should show:

Index Scan using Verification_contentHash_createdAt_idx
No "Seq Scan" on large tables

Index Maintenance

Creating New Indexes

When adding new indexes:

Analyze query patterns in production
Test with EXPLAIN ANALYZE first
Create migration: npx prisma migrate dev --name add_index_name
Monitor performance impact

Partial Indexes (Future Consideration)

For common filter combinations, consider partial indexes:

// Example: Index only failed verifications
@@index([createdAt], where: status = 'FAIL')

PostgreSQL supports partial indexes, but Prisma's support is limited. Use raw SQL migrations if needed.

Index Size Monitoring

Monitor index sizes to prevent bloat:

SELECT schemaname, tablename, indexname,
       pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Migration Strategy

For Existing Databases

Add indexes during low-traffic periods
Use CREATE INDEX CONCURRENTLY in PostgreSQL (prevents table locking)
Monitor query performance before/after

Schema Changes

Always run migrations with:

npm run db:migrate

This ensures:

Both clients (API and Web) are updated
Migration history is tracked
Indexes are created properly

Issue #12: Database schema optimization with proper indexes
Issue #13: Monitor slow query log in production
Issue #10: Optimization roadmap
Issue #5: Schema consolidation

Performance Targets

With proper indexing, the system should handle:

100k+ content registrations
1M+ verifications
Sub-100ms query response times for indexed queries
Efficient pagination with offset/cursor-based approaches

Future Optimizations

Partial indexes for common status filters
Covering indexes to avoid table lookups
Index-only scans for frequently accessed columns
Materialized views for complex aggregations
Query result caching at the application layer

7.0 KiB Raw Permalink Blame History