Database Indexing

Overview

Database indexing is the primary mechanism for query performance optimization. Proper index strategy ensures queries complete in milliseconds rather than seconds, but over-indexing degrades write performance and wastes storage. In the full-lifecycle pipeline, the @database-engineer agent generates index recommendations during Phase 5 based on API query patterns identified in Phase 4.

Key Concepts

PostgreSQL Index Types

Index Type	Use Case	Operators	Example
B-tree (default)	Equality, range, sorting	`=`, `<`, `>`, `BETWEEN`, `ORDER BY`	`CREATE INDEX idx_users_email ON users (email)`
Hash	Equality only	`=`	Rarely used (B-tree handles equality too)
GIN	Array, JSONB, full-text	`@>`, `?`, `@@`, `&&`	`CREATE INDEX idx_data ON items USING GIN (metadata)`
GiST	Geometric, range, proximity	`<<`, `>>`, `@>`, `<@`	`CREATE INDEX idx_geo ON locations USING GIST (coords)`
BRIN	Large tables with natural ordering	`<`, `>`, `=`	`CREATE INDEX idx_ts ON events USING BRIN (created_at)`

Index Anatomy

┌──────────────────────────────────────────────────┐
│                B-tree Index                       │
├──────────────────────────────────────────────────┤
│                                                   │
│  Root Node:        [  M  ]                        │
│                   /       \                       │
│  Internal:    [ D  H ]   [ R  V ]                │
│              / | \      / | \                     │
│  Leaf:    [A-C][D-G]  [R-T][V-Z]  → table rows   │
│                                                   │
│  Leaf nodes are doubly-linked for range scans     │
└──────────────────────────────────────────────────┘

Index Strategies

Strategy	Pattern	When to Use
Single-column	`CREATE INDEX ON t (col)`	Simple equality/range on one column
Composite	`CREATE INDEX ON t (a, b, c)`	Multi-column WHERE/ORDER BY (leftmost prefix rule)
Partial	`CREATE INDEX ON t (col) WHERE active = true`	Query only touches a subset of rows
Covering	`CREATE INDEX ON t (a) INCLUDE (b, c)`	Index-only scans (avoid heap access)
Expression	`CREATE INDEX ON t (LOWER(email))`	Queries use function on column
Unique	`CREATE UNIQUE INDEX ON t (email)`	Enforce uniqueness + fast lookup

Best Practices

Index every foreign key — JOINs on un-indexed FKs cause full table scans
Use composite indexes wisely — Column order matters: high-cardinality first for equality, range column last
Use partial indexes for hot data — WHERE is_active = true when most rows are inactive
Use CONCURRENTLY in production — CREATE INDEX CONCURRENTLY avoids table locks
Monitor index usage — Drop unused indexes (pg_stat_user_indexes.idx_scan = 0)
Keep indexes under 5 per table for write-heavy tables — Each index adds write overhead
Use EXPLAIN ANALYZE — Always verify index usage before and after

Code Examples

✅ Good: Strategic Index Design

-- Orders table with well-designed indexes
CREATE TABLE orders (
    id              BIGSERIAL PRIMARY KEY,
    customer_id     BIGINT NOT NULL REFERENCES customers(id),
    status          VARCHAR(20) NOT NULL DEFAULT 'PENDING',
    total_amount    NUMERIC(19,4) NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- FK index (mandatory)
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

-- Partial index: only active orders queried frequently
CREATE INDEX idx_orders_pending ON orders (customer_id, created_at)
    WHERE status = 'PENDING';

-- Covering index for dashboard query (index-only scan)
CREATE INDEX idx_orders_status_amount ON orders (status) 
    INCLUDE (total_amount, created_at);

-- EXPLAIN ANALYZE verification
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT customer_id, total_amount, created_at
FROM orders
WHERE status = 'PENDING' AND customer_id = 42;
-- Expected: Index Scan using idx_orders_pending

❌ Bad: Index Anti-Patterns

-- Redundant index (composite already covers single-column)
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
-- First index is redundant — composite covers single-column queries

-- Index on low-cardinality boolean (not selective)
CREATE INDEX idx_orders_is_deleted ON orders (is_deleted);
-- Only 2 values — use partial index instead

-- Too many indexes on write-heavy table
CREATE INDEX idx1 ON orders (col1);
CREATE INDEX idx2 ON orders (col2);
CREATE INDEX idx3 ON orders (col3);
CREATE INDEX idx4 ON orders (col4);
CREATE INDEX idx5 ON orders (col5);
CREATE INDEX idx6 ON orders (col6);
-- 6 indexes = 6x write amplification

Anti-Patterns

Index everything — Over-indexing kills write performance and wastes storage
Missing FK indexes — Implicit sequential scans on JOINs
Wrong column order in composite — Equality columns first, range column last
Not using CONCURRENTLY — Table locks during index creation cause downtime
Never checking index usage — Unused indexes waste space and slow writes
Indexing low-cardinality columns — Boolean/status columns with 2-3 values (use partial)

Testing Strategies

EXPLAIN ANALYZE in tests — Assert expected index usage in integration tests
Index-only scan verification — Check Heap Fetches: 0 for covering indexes
Performance benchmarks — Measure query time with and without index on realistic data
Write performance impact — Benchmark INSERT/UPDATE throughput with index count changes
pg_stat_user_indexes — Monitor idx_scan counts in staging/production