Files

Copilot fc7863bbea Add browser extension for streamlined platform verification (#107 )

* Initial plan

* Add browser extension implementation with core features

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* Update documentation and add extension to build system

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* Add testing guide, SVG icon placeholder, and implementation summary

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* Fix security vulnerabilities: XSS prevention and proper URL encoding

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* Add comprehensive security documentation for browser extension

* Fix URL sanitization in platform detector for security

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

* Add final implementation report - Browser extension complete

Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: onnwee <211922112+onnwee@users.noreply.github.com>

2025-11-01 00:44:32 -05:00

14 KiB

Raw Permalink Blame History

Observability and Monitoring

This document describes the observability stack for Internet-ID, including structured logging, metrics collection, and monitoring setup.

Overview

Internet-ID implements a comprehensive observability baseline to support incident response, performance monitoring, and system health tracking:

Structured Logging: JSON-formatted logs with correlation IDs using Pino
Metrics Export: Prometheus-compatible metrics using prom-client
Health Checks: Detailed service health endpoints
Request Tracing: Automatic correlation ID generation for request tracking

Quick Start

Local Development

Start the API server:
```
npm run start:api
```
Access observability endpoints:
- Health check: http://localhost:3001/api/health
- Prometheus metrics: http://localhost:3001/api/metrics
- Metrics (JSON): http://localhost:3001/api/metrics/json
View logs: Logs are automatically printed to stdout with pretty formatting in development mode.

Structured Logging

Overview

The logging service uses Pino, a high-performance JSON logger for Node.js. All logs include:

Timestamp: ISO 8601 format
Log level: trace, debug, info, warn, error, fatal
Service name: internet-id-api
Environment: development, production, etc.
Correlation ID: Unique ID per request for tracing
Context: Additional structured data

Configuration

Configure logging via environment variables in .env:

# Log level (trace, debug, info, warn, error, fatal)
# Default: info
LOG_LEVEL=info

# Application environment
NODE_ENV=production

Log Levels

trace: Very verbose debugging (e.g., function entry/exit)
debug: Detailed debugging information
info: General informational messages (default)
warn: Warning messages that don't prevent operation
error: Error messages for handled exceptions
fatal: Critical errors that cause service termination

Usage in Code

import { logger } from "./services/logger.service";

// Simple log message
logger.info("User registered successfully");

// Log with context
logger.info("File uploaded", {
  userId: "123",
  filename: "video.mp4",
  size: 1024000,
});

// Log errors
try {
  // ... some operation
} catch (error) {
  logger.error("Failed to process file", error, {
    userId: "123",
    operation: "upload",
  });
}

// Create child logger with persistent context
const childLogger = logger.child({
  module: "verification",
  userId: "123",
});
childLogger.info("Starting verification");

Request Correlation

Every HTTP request automatically gets a correlation ID that appears in all logs for that request:

{
  "level": "info",
  "time": "2025-10-31T03:17:28.870Z",
  "correlationId": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "msg": "Incoming request",
  "method": "POST",
  "url": "/api/register",
  "userAgent": "Mozilla/5.0...",
  "ip": "192.168.1.1"
}

Access the correlation ID in request handlers:

app.post("/api/example", (req, res) => {
  const correlationId = req.correlationId;
  req.log.info("Processing request"); // Uses request-specific logger
  // ...
});

Sensitive Data Redaction

The logger automatically redacts sensitive fields from logs:

*.password
*.secret
*.token
*.apiKey
*.privateKey
req.headers.authorization
req.headers['x-api-key']

These fields are completely removed from log output.

Metrics

Overview

Metrics are exposed in Prometheus format at /api/metrics for scraping by monitoring systems. The service tracks:

HTTP request latency and counts
Active connections
Cache performance (hits/misses)
Verification outcomes
IPFS upload performance
Database query performance

Available Metrics

HTTP Metrics

# Request duration histogram (seconds)
http_request_duration_seconds{method="POST",route="/api/register",status_code="200"}

# Request count
http_requests_total{method="POST",route="/api/register",status_code="200"}

# Active connections
active_connections

Application Metrics

# Verification outcomes
verification_total{outcome="success",platform="youtube"}
verification_duration_seconds{outcome="success",platform="youtube"}

# IPFS uploads
ipfs_uploads_total{provider="pinata",status="success"}
ipfs_upload_duration_seconds{provider="pinata"}

# Cache performance
cache_hits_total{cache_type="redis"}
cache_misses_total{cache_type="redis"}

# Database queries
db_query_duration_seconds{operation="findMany",table="Content"}

Default Metrics

Node.js process metrics are automatically collected:

process_cpu_user_seconds_total
process_cpu_system_seconds_total
process_resident_memory_bytes
process_heap_bytes
nodejs_eventloop_lag_seconds
nodejs_gc_duration_seconds
And more...

Accessing Metrics

Prometheus format (for scraping):

curl http://localhost:3001/api/metrics

JSON format (for debugging):

curl http://localhost:3001/api/metrics/json

Prometheus Configuration

To scrape metrics with Prometheus, add this job to your prometheus.yml:

scrape_configs:
  - job_name: "internet-id-api"
    scrape_interval: 15s
    static_configs:
      - targets: ["localhost:3001"]
    metrics_path: "/api/metrics"

For production deployments with multiple instances, use service discovery:

scrape_configs:
  - job_name: "internet-id-api"
    scrape_interval: 15s
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: internet-id-api

Health Checks

Endpoint

GET /api/health

Returns detailed health status of all service components:

{
  "status": "ok",
  "timestamp": "2025-10-31T03:17:28.870Z",
  "uptime": 3600.5,
  "services": {
    "database": {
      "status": "healthy"
    },
    "cache": {
      "status": "healthy",
      "enabled": true
    },
    "blockchain": {
      "status": "healthy",
      "blockNumber": 12345678
    }
  }
}

Status Codes

200 OK: All services healthy
503 Service Unavailable: One or more services unhealthy or degraded

Service Status Values

healthy: Service operating normally
degraded: Service operational but with issues (e.g., cache unavailable)
unhealthy: Service not operational
disabled: Service intentionally disabled

Using Health Checks

Kubernetes liveness probe:

livenessProbe:
  httpGet:
    path: /api/health
    port: 3001
  initialDelaySeconds: 30
  periodSeconds: 10

Docker healthcheck:

HEALTHCHECK --interval=30s --timeout=3s --start-period=40s \
  CMD curl -f http://localhost:3001/api/health || exit 1

Log Shipping

Production Log Destinations

For production deployments, ship logs to a centralized logging service. Configuration examples:

Logtail (BetterStack)

# .env
LOGTAIL_SOURCE_TOKEN=your_logtail_source_token

To integrate Logtail, install the transport:

npm install @logtail/pino

Update logger.service.ts to add Logtail transport when token is present.

Datadog

# .env
DATADOG_API_KEY=your_datadog_api_key
DATADOG_APP_KEY=your_datadog_app_key
DATADOG_SITE=datadoghq.com  # or datadoghq.eu for EU

To integrate Datadog, install the transport:

npm install pino-datadog

ELK Stack (Elasticsearch)

# .env
ELASTICSEARCH_URL=https://your-elasticsearch-host:9200
ELASTICSEARCH_USERNAME=your_username
ELASTICSEARCH_PASSWORD=your_password
ELASTICSEARCH_INDEX=internet-id-logs

To integrate Elasticsearch, use Filebeat or Logstash to collect logs from stdout/files.

File-based Logging

For file-based logging with rotation:

npm install pino-roll

Or use OS-level log rotation with rsyslog/logrotate.

Docker/Kubernetes Logging

When running in containers, simply log to stdout (default). Container orchestration platforms automatically collect logs:

Docker Compose:

services:
  api:
    image: internet-id-api
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Kubernetes: Logs are automatically collected by the cluster logging system (Fluentd, Fluent Bit, etc.).

Monitoring Dashboards

Prometheus + Grafana

Set up Prometheus to scrape metrics (see configuration above)
Install Grafana and add Prometheus as a data source
Import dashboard template:

Create a dashboard with these panels:

Request Rate & Latency:

# Request rate
rate(http_requests_total[5m])

# P95 latency
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Error rate
rate(http_requests_total{status_code=~"5.."}[5m])

Application Metrics:

# Cache hit rate
rate(cache_hits_total[5m]) / (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m]))

# Verification success rate
rate(verification_total{outcome="success"}[5m]) / rate(verification_total[5m])

# Active connections
active_connections

System Metrics:

# CPU usage
rate(process_cpu_user_seconds_total[5m])

# Memory usage
process_resident_memory_bytes

# Event loop lag
rate(nodejs_eventloop_lag_seconds[5m])

Example Grafana Dashboard JSON

See ops/monitoring/grafana-dashboard.json (to be created) for a complete dashboard template.

Alerting

Prometheus Alerting Rules

Example alert rules for prometheus/alerts.yml:

groups:
  - name: internet_id_api
    interval: 30s
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{status_code=~"5.."}[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }}"

      # Service unavailable
      - alert: ServiceDown
        expr: up{job="internet-id-api"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Internet-ID API is down"

      # High latency
      - alert: HighLatency
        expr: |
          histogram_quantile(0.95, 
            rate(http_request_duration_seconds_bucket[5m])
          ) > 5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High API latency"
          description: "P95 latency is {{ $value }}s"

      # Low cache hit rate
      - alert: LowCacheHitRate
        expr: |
          rate(cache_hits_total[5m]) / 
          (rate(cache_hits_total[5m]) + rate(cache_misses_total[5m])) < 0.5
        for: 15m
        labels:
          severity: info
        annotations:
          summary: "Cache hit rate is low"
          description: "Hit rate: {{ $value | humanizePercentage }}"

Best Practices

Logging Best Practices

Use structured logging: Always log with context objects, not string concatenation

// Good
logger.info("User registered", { userId, email });

// Bad
logger.info(`User ${userId} registered with email ${email}`);

Choose appropriate log levels: Don't log everything at info level
Include correlation IDs: Use the request logger (req.log) to maintain correlation
Don't log sensitive data: Even with redaction, be careful with PII and secrets
Add context, not just messages: Logs should be queryable and filterable

Metrics Best Practices

Use labels wisely: Don't use unbounded values (like user IDs) as labels
Keep cardinality low: Limit the number of unique label combinations
Prefer histograms over summaries: Histograms are aggregatable across instances
Use seconds for durations: Prometheus convention
Name metrics clearly: Follow Prometheus naming conventions
- _total suffix for counters
- _seconds suffix for durations
- _bytes suffix for sizes

Monitoring Best Practices

Monitor the golden signals: Latency, Traffic, Errors, Saturation (Google SRE)
Set meaningful alerts: Avoid alert fatigue with actionable alerts only
Document your alerts: Include runbooks for each alert
Test your alerts: Verify alerts fire under expected conditions
Monitor business metrics: Track verification rates, registrations, etc.

Troubleshooting

Logs not appearing

Check log level:

echo $LOG_LEVEL  # Should be info or lower

Check NODE_ENV:

echo $NODE_ENV  # Pretty logs only in development

Enable debug logging temporarily:

LOG_LEVEL=debug npm run start:api

Metrics not available

Verify endpoint responds:

curl http://localhost:3001/api/metrics

Check Prometheus scrape status: Visit http://localhost:9090/targets in Prometheus UI

View metrics in JSON for debugging:

curl http://localhost:3001/api/metrics/json | jq

High memory usage

Check for metrics cardinality explosion:

# Count unique metric series
curl -s http://localhost:3001/api/metrics | grep -c '^[a-z]'

If this number is very high (>10,000), you may have too many label combinations.

Performance impact

Logging: Pino is extremely fast (minimal overhead)

Use async logging in production for even better performance
Avoid logging in tight loops

Metrics: Minimal overhead for most metrics

Histograms are more expensive than counters/gauges
Keep label cardinality low

14 KiB Raw Permalink Blame History

Observability and Monitoring

Overview

Quick Start

Local Development

Structured Logging

Overview

Configuration

Log Levels

Usage in Code

Request Correlation

Sensitive Data Redaction

Metrics

Overview

Available Metrics

HTTP Metrics

Application Metrics

Default Metrics

Accessing Metrics

Prometheus Configuration

Health Checks

Endpoint

Status Codes

Service Status Values

Using Health Checks

Log Shipping

Production Log Destinations

Logtail (BetterStack)

Datadog

ELK Stack (Elasticsearch)

File-based Logging

Docker/Kubernetes Logging

Monitoring Dashboards

Prometheus + Grafana

Example Grafana Dashboard JSON

Alerting

Prometheus Alerting Rules

Best Practices

Logging Best Practices

Metrics Best Practices

Monitoring Best Practices

Troubleshooting

Logs not appearing

Metrics not available

High memory usage

Performance impact

Related Documentation

References

14 KiB

Raw Permalink Blame History