Query Optimization

Overview

Query optimization ensures database operations execute efficiently at scale. It encompasses understanding the PostgreSQL query planner, eliminating N+1 queries through JPA fetch strategies, leveraging connection pooling with HikariCP, and using advanced SQL constructs like CTEs and materialized views. In the full-lifecycle pipeline, the @database-engineer agent generates optimized query patterns in Phase 5 that the @backend-java agent implements in Phase 6.

Key Concepts

PostgreSQL Query Planner

┌───────────────────────────────────────────────────┐
│             Query Execution Pipeline               │
├───────────────────────────────────────────────────┤
│                                                    │
│  SQL Query                                         │
│      ↓                                             │
│  Parser → Query Tree                               │
│      ↓                                             │
│  Rewriter → Optimized Tree (views, rules)          │
│      ↓                                             │
│  Planner → Multiple Plans → Cost Estimation        │
│      ↓                                             │
│  Executor → Chosen Plan → Result Set               │
│                                                    │
│  EXPLAIN shows the plan                            │
│  EXPLAIN ANALYZE shows plan + actual execution     │
└───────────────────────────────────────────────────┘

EXPLAIN ANALYZE Reading Guide

Field	What It Tells You
Seq Scan	Full table scan — consider adding index
Index Scan	Using an index — good
Index Only Scan	Using covering index — best
Bitmap Heap Scan	Multiple index results combined — acceptable
Nested Loop	Row-by-row join — fast for small outer set
Hash Join	Hash table join — good for larger sets
Sort	Explicit sort — check if index can eliminate
actual time	Real execution time in ms (first row..last row)
rows	Actual vs estimated rows (big diff = stale stats)
Buffers: shared hit	Pages found in cache
Buffers: shared read	Pages read from disk

N+1 Problem and Solutions

Pattern	Queries	Solution
N+1 (bad)	1 + N	Lazy loading triggers N extra queries
JOIN FETCH	1	`@Query("SELECT o FROM Order o JOIN FETCH o.items")`
EntityGraph	1	`@EntityGraph(attributePaths = {"items"})`
Batch fetching	1 + ceil(N/batch)	`@BatchSize(size = 25)`
Projection	1	`interface OrderSummary { String getStatus(); }`

Best Practices

Always use EXPLAIN ANALYZE — Never guess; measure actual query performance
Fix N+1 queries first — Biggest performance win in JPA applications
Use projections for read queries — Don’t load entire entities when you need 3 fields
Configure HikariCP properly — maximumPoolSize = (2 * CPU cores) + disk spindles
Use pagination — Never SELECT * without LIMIT; use keyset pagination for large datasets
Run ANALYZE regularly — Stale statistics cause bad query plans
**Avoid SELECT *** — Specify columns; reduces I/O and enables index-only scans
Use batch operations — saveAll() with spring.jpa.properties.hibernate.jdbc.batch_size=25

Code Examples

✅ Good: Optimized JPA Query

// Repository with optimized queries
@Repository
public interface OrderRepository extends JpaRepository<Order, Long> {
    
    // JOIN FETCH eliminates N+1
    @Query("""
        SELECT o FROM Order o
        JOIN FETCH o.items i
        JOIN FETCH o.customer c
        WHERE o.status = :status
        AND o.createdAt > :since
        ORDER BY o.createdAt DESC
        """)
    List<Order> findRecentByStatus(
        @Param("status") OrderStatus status,
        @Param("since") Instant since
    );
    
    // Projection for read-only dashboard (no entity overhead)
    @Query("""
        SELECT o.id as id, o.status as status,
               o.totalAmount as totalAmount,
               c.displayName as customerName
        FROM Order o JOIN o.customer c
        WHERE o.createdAt BETWEEN :start AND :end
        """)
    Page<OrderSummary> findOrderSummaries(
        @Param("start") Instant start,
        @Param("end") Instant end,
        Pageable pageable
    );
    
    // Keyset pagination (cursor-based, no OFFSET)
    @Query("""
        SELECT o FROM Order o
        WHERE o.createdAt < :cursor
        ORDER BY o.createdAt DESC
        """)
    List<Order> findNextPage(
        @Param("cursor") Instant cursor,
        Pageable pageable
    );
}

✅ Good: HikariCP Configuration

# application.yml
spring:
  datasource:
    hikari:
      maximum-pool-size: 10          # (2 * CPU) + spindles
      minimum-idle: 5
      idle-timeout: 300000           # 5 minutes
      max-lifetime: 1800000          # 30 minutes
      connection-timeout: 30000      # 30 seconds
      leak-detection-threshold: 60000 # 1 minute
  jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 25             # Batch inserts/updates
          fetch_size: 50
        order_inserts: true          # Group inserts by entity
        order_updates: true
        default_batch_fetch_size: 25 # Batch lazy loading
    open-in-view: false              # CRITICAL: disable OSIV

❌ Bad: Common Query Mistakes

// N+1: Loading orders then accessing items in loop
List<Order> orders = orderRepository.findAll();
for (Order order : orders) {
    // Each call triggers a SELECT on order_items
    int itemCount = order.getItems().size();
}

// Loading full entities for a count
long count = orderRepository.findAll().stream().count();
// Should be: orderRepository.count();

// OFFSET pagination on large dataset (scans and discards rows)
Page<Order> page = orderRepository.findAll(PageRequest.of(1000, 20));
// Keyset/cursor pagination is O(1), OFFSET is O(n)

✅ Good: Advanced SQL Patterns

-- CTE for readable complex queries
WITH monthly_totals AS (
    SELECT customer_id,
           DATE_TRUNC('month', created_at) AS month,
           SUM(total_amount) AS total
    FROM orders
    WHERE created_at >= NOW() - INTERVAL '12 months'
    GROUP BY customer_id, DATE_TRUNC('month', created_at)
),
ranked AS (
    SELECT customer_id, month, total,
           RANK() OVER (PARTITION BY month ORDER BY total DESC) AS rank
    FROM monthly_totals
)
SELECT * FROM ranked WHERE rank <= 10;

-- Materialized view for expensive aggregations
CREATE MATERIALIZED VIEW mv_order_stats AS
SELECT customer_id,
       COUNT(*) AS order_count,
       SUM(total_amount) AS lifetime_value,
       MAX(created_at) AS last_order_at
FROM orders
GROUP BY customer_id;

CREATE UNIQUE INDEX idx_mv_order_stats_customer 
    ON mv_order_stats (customer_id);

-- Refresh on schedule (not real-time)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_stats;

Anti-Patterns

N+1 queries — Lazy loading in loops without fetch strategy
**SELECT *** — Loads unnecessary columns, prevents index-only scans
OFFSET pagination — Performance degrades linearly; use keyset/cursor
OSIV (Open Session In View) — Lazy loading in controllers causes unpredictable queries
Missing connection pool limits — Unbounded connections exhaust DB
Implicit transactions — Long-running transactions hold locks

Testing Strategies

Query count assertions — Use datasource-proxy to assert exact SQL count per operation
EXPLAIN plan assertions — Verify index usage in integration tests
Load testing — k6/Gatling with production-like data volumes
Connection pool monitoring — Assert no connection leaks in test teardown
Slow query log — Enable log_min_duration_statement = 100 in test PostgreSQL

Query Optimization

Query Optimization

Overview

Key Concepts

PostgreSQL Query Planner

EXPLAIN ANALYZE Reading Guide

N+1 Problem and Solutions

Best Practices

Code Examples

✅ Good: Optimized JPA Query

✅ Good: HikariCP Configuration

❌ Bad: Common Query Mistakes

✅ Good: Advanced SQL Patterns

Anti-Patterns

Testing Strategies

References

Related Skills