Files
discord-spywatcher/POSTGRESQL.md
Copilot 2aa4be44f7 [WIP] Create contributing guidelines for open source contributors (#170)
* Initial plan

* docs: add comprehensive contributing guidelines and templates

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* docs: update README and SECURITY with better formatting and links

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* docs: finalize contributing guidelines and formatting

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>
2025-11-04 15:38:59 -06:00

13 KiB

PostgreSQL Setup and Management Guide

This guide covers the PostgreSQL configuration, features, and management for Discord SpyWatcher.

Table of Contents

Overview

Discord SpyWatcher uses PostgreSQL 15+ as its production database, providing:

  • Advanced data types (JSONB, arrays, UUIDs, timestamps with timezone)
  • Full-text search capabilities
  • Better concurrency and performance
  • Production-ready reliability
  • Advanced indexing strategies

PostgreSQL-Specific Features

Advanced Data Types

JSONB Fields

All event models include a metadata field using JSONB for flexible, queryable JSON storage:

// Store flexible metadata
await db.presenceEvent.create({
    data: {
        userId: '123',
        username: 'user',
        clients: ['desktop', 'mobile'],
        metadata: {
            status: 'online',
            activities: ['gaming'],
            customField: 'value',
        },
    },
});

// Query JSONB data
const events = await db.presenceEvent.findMany({
    where: {
        metadata: {
            path: ['status'],
            equals: 'online',
        },
    },
});

Array Fields

Comma-separated strings have been converted to native PostgreSQL arrays:

// PresenceEvent.clients - array of client types
clients: ['desktop', 'web', 'mobile'];

// RoleChangeEvent.addedRoles - array of role IDs
addedRoles: ['123456789', '987654321'];

UUID Primary Keys

Event models use UUIDs for better distribution and security:

// Auto-generated UUID
id: '550e8400-e29b-41d4-a716-446655440000';

Timezone-Aware Timestamps

All timestamp fields use TIMESTAMPTZ for proper timezone handling:

createdAt: DateTime @default(now()) @db.Timestamptz

The MessageEvent model supports full-text search on the content field using PostgreSQL's native text search capabilities.

Setup

Run the setup script to add the full-text search index:

DB_PASSWORD=yourpassword ./scripts/setup-fulltext-search.sh

This creates:

  • A generated content_search tsvector column
  • A GIN index for efficient text searches

Usage

// Raw SQL query for full-text search
const results = await db.$queryRaw`
  SELECT *, ts_rank(content_search, to_tsquery('english', ${query})) AS rank
  FROM "MessageEvent"
  WHERE content_search @@ to_tsquery('english', ${query})
  ORDER BY rank DESC
  LIMIT 20;
`;

// Search for phrases
const results = await db.$queryRaw`
  SELECT * FROM "MessageEvent"
  WHERE content_search @@ plainto_tsquery('english', ${'hello world'})
`;

// Complex search with operators
const results = await db.$queryRaw`
  SELECT * FROM "MessageEvent"
  WHERE content_search @@ to_tsquery('english', ${'cat & dog | bird'})
`;

Search Operators

  • & - AND (both terms must be present)
  • | - OR (either term can be present)
  • ! - NOT (term must not be present)
  • <-> - Phrase search (terms must be adjacent)

Example: 'cat & dog' finds messages with both "cat" and "dog"

Performance Features

Optimized Indexes

The schema includes strategic indexes for common queries:

// Event models have indexes on:
- userId (for user-specific queries)
- guildId (for guild-specific queries)
- channelId (for channel-specific queries)
- createdAt (for time-based queries)

// User model has indexes on:
- discordId (for Discord integration)
- email (for user lookup)

// Security models have indexes on:
- ipAddress (for IP-based queries)
- action (for action-based queries)
- timestamp/createdAt (for time-based queries)

Composite Indexes

Some models use composite indexes for complex queries:

@@index([userId, createdAt]) // User activity over time
@@index([guildId, createdAt]) // Guild activity over time

Connection Configuration

Connection String Format

DATABASE_URL="postgresql://username:password@host:port/database?schema=public&connection_limit=10&pool_timeout=20&connect_timeout=10"

Connection Pooling Parameters

Parameter Recommended Value Description
connection_limit 10-50 Maximum number of connections in the pool
pool_timeout 20 Seconds to wait for an available connection
connect_timeout 10 Seconds to wait for initial connection
sslmode require (prod) SSL/TLS encryption mode

Example Configurations

Development

DATABASE_URL="postgresql://spywatcher:password@localhost:5432/spywatcher?connection_limit=10&pool_timeout=20"

Production

DATABASE_URL="postgresql://spywatcher:securepassword@db.example.com:5432/spywatcher?sslmode=require&connection_limit=50&pool_timeout=20&connect_timeout=10"

Docker

DATABASE_URL="postgresql://spywatcher:${DB_PASSWORD}@postgres:5432/spywatcher?connection_limit=20&pool_timeout=20"

Database Management

Migrations

Create a Migration

cd backend
npx prisma migrate dev --name add_new_feature

Apply Migrations (Production)

npx prisma migrate deploy

Reset Database (Development Only)

npx prisma migrate reset

Data Migration from SQLite

Use the provided migration script:

# Dry run first
cd backend
DRY_RUN=true \
  SQLITE_DATABASE_URL="file:./prisma/dev.db" \
  DATABASE_URL="postgresql://spywatcher:password@localhost:5432/spywatcher" \
  npm run db:migrate:dry

# Actual migration
SQLITE_DATABASE_URL="file:./prisma/dev.db" \
  DATABASE_URL="postgresql://spywatcher:password@localhost:5432/spywatcher" \
  npm run db:migrate

Seeding Data

cd backend
npm run prisma:seed

Performance Optimization

PostgreSQL Configuration

The Docker Compose configuration includes optimized PostgreSQL settings:

command:
  - "postgres"
  - "-c" "max_connections=100"
  - "-c" "shared_buffers=256MB"
  - "-c" "effective_cache_size=1GB"
  - "-c" "maintenance_work_mem=64MB"
  - "-c" "checkpoint_completion_target=0.9"
  - "-c" "wal_buffers=16MB"
  - "-c" "default_statistics_target=100"
  - "-c" "random_page_cost=1.1"
  - "-c" "effective_io_concurrency=200"
  - "-c" "work_mem=4MB"
  - "-c" "min_wal_size=1GB"
  - "-c" "max_wal_size=4GB"

Query Optimization

Use Indexes Effectively

// Good - uses index
const events = await db.presenceEvent.findMany({
    where: { userId: '123' },
    orderBy: { createdAt: 'desc' },
});

// Bad - no index on username alone
const events = await db.presenceEvent.findMany({
    where: { username: 'john' },
});

Batch Operations

// Use createMany for bulk inserts
await db.presenceEvent.createMany({
    data: events,
    skipDuplicates: true,
});

Pagination

// Efficient pagination with cursor
const events = await db.presenceEvent.findMany({
    take: 20,
    skip: 1,
    cursor: { id: lastId },
    orderBy: { createdAt: 'desc' },
});

Monitoring Queries

Enable query logging in development:

const db = new PrismaClient({
    log: ['query', 'error', 'warn'],
});

Backup and Recovery

Automated Backups

Use the provided backup script:

# Manual backup
DB_PASSWORD=yourpassword npm run db:backup

# Scheduled backup (cron)
0 2 * * * DB_PASSWORD=yourpassword /path/to/scripts/backup.sh

Backup Features

  • Compressed backups (gzip)
  • 30-day retention by default
  • Optional S3 upload
  • Automatic cleanup

Restore from Backup

DB_PASSWORD=yourpassword npm run db:restore /path/to/backup.sql.gz

Point-in-Time Recovery

For production, enable WAL archiving:

-- Enable continuous archiving
ALTER SYSTEM SET wal_level = 'replica';
ALTER SYSTEM SET archive_mode = ON;
ALTER SYSTEM SET archive_command = 'cp %p /path/to/archive/%f';

Monitoring

Database Metrics

Connection Count

SELECT count(*), state
FROM pg_stat_activity
WHERE datname = 'spywatcher'
GROUP BY state;

Database Size

SELECT pg_size_pretty(pg_database_size('spywatcher'));

Table Sizes

SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Index Usage

SELECT
  schemaname || '.' || tablename AS table,
  indexname,
  idx_scan AS scans,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan DESC;

Slow Queries

SELECT
  pid,
  now() - query_start AS duration,
  state,
  query
FROM pg_stat_activity
WHERE state != 'idle'
  AND datname = 'spywatcher'
  AND now() - query_start > interval '1 second'
ORDER BY duration DESC;

Maintenance Script

Run regular maintenance:

DB_PASSWORD=yourpassword npm run db:maintenance

This performs:

  • VACUUM ANALYZE (cleanup and optimization)
  • Statistics updates
  • Bloat detection
  • Unused index detection
  • Connection monitoring
  • Long-running query detection

Performance Monitoring Tools

pg_stat_statements

Enable query statistics:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- View top queries by total time
SELECT
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Troubleshooting

Connection Issues

Problem: Cannot connect to database

Error: P1001: Can't reach database server at `postgres:5432`

Solutions:

  • Check if PostgreSQL is running: docker-compose ps
  • Verify DATABASE_URL is correct
  • Check network connectivity
  • Ensure PostgreSQL container is healthy

Problem: Too many connections

Error: FATAL: remaining connection slots are reserved

Solutions:

  • Increase max_connections in PostgreSQL configuration
  • Reduce connection_limit in DATABASE_URL
  • Check for connection leaks in application code
  • Use connection pooling (PgBouncer)

Performance Issues

Problem: Slow queries

Solutions:

  • Enable query logging to identify slow queries
  • Add indexes for commonly queried fields
  • Use EXPLAIN ANALYZE to understand query plans
  • Consider materialized views for complex queries
  • Optimize WHERE clauses to use indexes

Problem: High memory usage

Solutions:

  • Reduce shared_buffers if too high
  • Adjust work_mem for complex queries
  • Run VACUUM to reclaim space
  • Check for table bloat

Migration Issues

Problem: Migration fails with schema mismatch

Solutions:

# Check migration status
npx prisma migrate status

# Mark specific migration as applied
npx prisma migrate resolve --applied "migration_name"

# Reset and reapply (development only)
npx prisma migrate reset

Problem: Type errors after migration

Solutions:

# Regenerate Prisma Client
npx prisma generate

# Rebuild application
npm run build

Data Issues

Problem: UUID vs Integer ID conflicts

Solution: Use the migration script to handle ID conversion:

npm run db:migrate

Problem: Array field errors

Solution: Ensure comma-separated strings are converted to arrays:

// Old: clients: "desktop,mobile"
// New: clients: ["desktop", "mobile"]

Best Practices

  1. Connection Management

    • Always use connection pooling
    • Close connections when done
    • Set appropriate pool limits
  2. Indexing

    • Index foreign keys
    • Index frequently queried fields
    • Monitor index usage
    • Remove unused indexes
  3. Query Optimization

    • Use prepared statements
    • Avoid N+1 queries
    • Use batch operations
    • Implement pagination
  4. Security

    • Use SSL/TLS in production
    • Rotate credentials regularly
    • Limit user permissions
    • Enable audit logging
  5. Backup and Recovery

    • Automate backups
    • Test restore procedures
    • Store backups securely
    • Document recovery process
  6. Monitoring

    • Track query performance
    • Monitor connection usage
    • Watch for slow queries
    • Set up alerts

Additional Resources

Support

For issues or questions: