Skip to content
Home / Skills / Database / Indexing
DA

Indexing

Database performance v1.0.0

Database Indexing

Overview

Database indexing is the primary mechanism for query performance optimization. Proper index strategy ensures queries complete in milliseconds rather than seconds, but over-indexing degrades write performance and wastes storage. In the full-lifecycle pipeline, the @database-engineer agent generates index recommendations during Phase 5 based on API query patterns identified in Phase 4.


Key Concepts

PostgreSQL Index Types

Index TypeUse CaseOperatorsExample
B-tree (default)Equality, range, sorting=, <, >, BETWEEN, ORDER BYCREATE INDEX idx_users_email ON users (email)
HashEquality only=Rarely used (B-tree handles equality too)
GINArray, JSONB, full-text@>, ?, @@, &&CREATE INDEX idx_data ON items USING GIN (metadata)
GiSTGeometric, range, proximity<<, >>, @>, <@CREATE INDEX idx_geo ON locations USING GIST (coords)
BRINLarge tables with natural ordering<, >, =CREATE INDEX idx_ts ON events USING BRIN (created_at)

Index Anatomy

┌──────────────────────────────────────────────────┐
│                B-tree Index                       │
├──────────────────────────────────────────────────┤
│                                                   │
│  Root Node:        [  M  ]                        │
│                   /       \                       │
│  Internal:    [ D  H ]   [ R  V ]                │
│              / | \      / | \                     │
│  Leaf:    [A-C][D-G]  [R-T][V-Z]  → table rows   │
│                                                   │
│  Leaf nodes are doubly-linked for range scans     │
└──────────────────────────────────────────────────┘

Index Strategies

StrategyPatternWhen to Use
Single-columnCREATE INDEX ON t (col)Simple equality/range on one column
CompositeCREATE INDEX ON t (a, b, c)Multi-column WHERE/ORDER BY (leftmost prefix rule)
PartialCREATE INDEX ON t (col) WHERE active = trueQuery only touches a subset of rows
CoveringCREATE INDEX ON t (a) INCLUDE (b, c)Index-only scans (avoid heap access)
ExpressionCREATE INDEX ON t (LOWER(email))Queries use function on column
UniqueCREATE UNIQUE INDEX ON t (email)Enforce uniqueness + fast lookup

Best Practices

  1. Index every foreign key — JOINs on un-indexed FKs cause full table scans
  2. Use composite indexes wisely — Column order matters: high-cardinality first for equality, range column last
  3. Use partial indexes for hot dataWHERE is_active = true when most rows are inactive
  4. Use CONCURRENTLY in productionCREATE INDEX CONCURRENTLY avoids table locks
  5. Monitor index usage — Drop unused indexes (pg_stat_user_indexes.idx_scan = 0)
  6. Keep indexes under 5 per table for write-heavy tables — Each index adds write overhead
  7. Use EXPLAIN ANALYZE — Always verify index usage before and after

Code Examples

✅ Good: Strategic Index Design

-- Orders table with well-designed indexes
CREATE TABLE orders (
    id              BIGSERIAL PRIMARY KEY,
    customer_id     BIGINT NOT NULL REFERENCES customers(id),
    status          VARCHAR(20) NOT NULL DEFAULT 'PENDING',
    total_amount    NUMERIC(19,4) NOT NULL,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- FK index (mandatory)
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

-- Partial index: only active orders queried frequently
CREATE INDEX idx_orders_pending ON orders (customer_id, created_at)
    WHERE status = 'PENDING';

-- Covering index for dashboard query (index-only scan)
CREATE INDEX idx_orders_status_amount ON orders (status) 
    INCLUDE (total_amount, created_at);

-- EXPLAIN ANALYZE verification
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT customer_id, total_amount, created_at
FROM orders
WHERE status = 'PENDING' AND customer_id = 42;
-- Expected: Index Scan using idx_orders_pending

❌ Bad: Index Anti-Patterns

-- Redundant index (composite already covers single-column)
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
-- First index is redundant — composite covers single-column queries

-- Index on low-cardinality boolean (not selective)
CREATE INDEX idx_orders_is_deleted ON orders (is_deleted);
-- Only 2 values — use partial index instead

-- Too many indexes on write-heavy table
CREATE INDEX idx1 ON orders (col1);
CREATE INDEX idx2 ON orders (col2);
CREATE INDEX idx3 ON orders (col3);
CREATE INDEX idx4 ON orders (col4);
CREATE INDEX idx5 ON orders (col5);
CREATE INDEX idx6 ON orders (col6);
-- 6 indexes = 6x write amplification

Anti-Patterns

  1. Index everything — Over-indexing kills write performance and wastes storage
  2. Missing FK indexes — Implicit sequential scans on JOINs
  3. Wrong column order in composite — Equality columns first, range column last
  4. Not using CONCURRENTLY — Table locks during index creation cause downtime
  5. Never checking index usage — Unused indexes waste space and slow writes
  6. Indexing low-cardinality columns — Boolean/status columns with 2-3 values (use partial)

Testing Strategies

  • EXPLAIN ANALYZE in tests — Assert expected index usage in integration tests
  • Index-only scan verification — Check Heap Fetches: 0 for covering indexes
  • Performance benchmarks — Measure query time with and without index on realistic data
  • Write performance impact — Benchmark INSERT/UPDATE throughput with index count changes
  • pg_stat_user_indexes — Monitor idx_scan counts in staging/production

References