Skip to content
Home / Skills / Database / Query Optimization
DA

Query Optimization

Database performance v1.0.0

Query Optimization

Overview

Query optimization ensures database operations execute efficiently at scale. It encompasses understanding the PostgreSQL query planner, eliminating N+1 queries through JPA fetch strategies, leveraging connection pooling with HikariCP, and using advanced SQL constructs like CTEs and materialized views. In the full-lifecycle pipeline, the @database-engineer agent generates optimized query patterns in Phase 5 that the @backend-java agent implements in Phase 6.


Key Concepts

PostgreSQL Query Planner

┌───────────────────────────────────────────────────┐
│             Query Execution Pipeline               │
├───────────────────────────────────────────────────┤
│                                                    │
│  SQL Query                                         │
│      ↓                                             │
│  Parser → Query Tree                               │
│      ↓                                             │
│  Rewriter → Optimized Tree (views, rules)          │
│      ↓                                             │
│  Planner → Multiple Plans → Cost Estimation        │
│      ↓                                             │
│  Executor → Chosen Plan → Result Set               │
│                                                    │
│  EXPLAIN shows the plan                            │
│  EXPLAIN ANALYZE shows plan + actual execution     │
└───────────────────────────────────────────────────┘

EXPLAIN ANALYZE Reading Guide

FieldWhat It Tells You
Seq ScanFull table scan — consider adding index
Index ScanUsing an index — good
Index Only ScanUsing covering index — best
Bitmap Heap ScanMultiple index results combined — acceptable
Nested LoopRow-by-row join — fast for small outer set
Hash JoinHash table join — good for larger sets
SortExplicit sort — check if index can eliminate
actual timeReal execution time in ms (first row..last row)
rowsActual vs estimated rows (big diff = stale stats)
Buffers: shared hitPages found in cache
Buffers: shared readPages read from disk

N+1 Problem and Solutions

PatternQueriesSolution
N+1 (bad)1 + NLazy loading triggers N extra queries
JOIN FETCH1@Query("SELECT o FROM Order o JOIN FETCH o.items")
EntityGraph1@EntityGraph(attributePaths = {"items"})
Batch fetching1 + ceil(N/batch)@BatchSize(size = 25)
Projection1interface OrderSummary { String getStatus(); }

Best Practices

  1. Always use EXPLAIN ANALYZE — Never guess; measure actual query performance
  2. Fix N+1 queries first — Biggest performance win in JPA applications
  3. Use projections for read queries — Don’t load entire entities when you need 3 fields
  4. Configure HikariCP properlymaximumPoolSize = (2 * CPU cores) + disk spindles
  5. Use pagination — Never SELECT * without LIMIT; use keyset pagination for large datasets
  6. Run ANALYZE regularly — Stale statistics cause bad query plans
  7. **Avoid SELECT *** — Specify columns; reduces I/O and enables index-only scans
  8. Use batch operationssaveAll() with spring.jpa.properties.hibernate.jdbc.batch_size=25

Code Examples

✅ Good: Optimized JPA Query

// Repository with optimized queries
@Repository
public interface OrderRepository extends JpaRepository<Order, Long> {
    
    // JOIN FETCH eliminates N+1
    @Query("""
        SELECT o FROM Order o
        JOIN FETCH o.items i
        JOIN FETCH o.customer c
        WHERE o.status = :status
        AND o.createdAt > :since
        ORDER BY o.createdAt DESC
        """)
    List<Order> findRecentByStatus(
        @Param("status") OrderStatus status,
        @Param("since") Instant since
    );
    
    // Projection for read-only dashboard (no entity overhead)
    @Query("""
        SELECT o.id as id, o.status as status,
               o.totalAmount as totalAmount,
               c.displayName as customerName
        FROM Order o JOIN o.customer c
        WHERE o.createdAt BETWEEN :start AND :end
        """)
    Page<OrderSummary> findOrderSummaries(
        @Param("start") Instant start,
        @Param("end") Instant end,
        Pageable pageable
    );
    
    // Keyset pagination (cursor-based, no OFFSET)
    @Query("""
        SELECT o FROM Order o
        WHERE o.createdAt < :cursor
        ORDER BY o.createdAt DESC
        """)
    List<Order> findNextPage(
        @Param("cursor") Instant cursor,
        Pageable pageable
    );
}

✅ Good: HikariCP Configuration

# application.yml
spring:
  datasource:
    hikari:
      maximum-pool-size: 10          # (2 * CPU) + spindles
      minimum-idle: 5
      idle-timeout: 300000           # 5 minutes
      max-lifetime: 1800000          # 30 minutes
      connection-timeout: 30000      # 30 seconds
      leak-detection-threshold: 60000 # 1 minute
  jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 25             # Batch inserts/updates
          fetch_size: 50
        order_inserts: true          # Group inserts by entity
        order_updates: true
        default_batch_fetch_size: 25 # Batch lazy loading
    open-in-view: false              # CRITICAL: disable OSIV

❌ Bad: Common Query Mistakes

// N+1: Loading orders then accessing items in loop
List<Order> orders = orderRepository.findAll();
for (Order order : orders) {
    // Each call triggers a SELECT on order_items
    int itemCount = order.getItems().size();
}

// Loading full entities for a count
long count = orderRepository.findAll().stream().count();
// Should be: orderRepository.count();

// OFFSET pagination on large dataset (scans and discards rows)
Page<Order> page = orderRepository.findAll(PageRequest.of(1000, 20));
// Keyset/cursor pagination is O(1), OFFSET is O(n)

✅ Good: Advanced SQL Patterns

-- CTE for readable complex queries
WITH monthly_totals AS (
    SELECT customer_id,
           DATE_TRUNC('month', created_at) AS month,
           SUM(total_amount) AS total
    FROM orders
    WHERE created_at >= NOW() - INTERVAL '12 months'
    GROUP BY customer_id, DATE_TRUNC('month', created_at)
),
ranked AS (
    SELECT customer_id, month, total,
           RANK() OVER (PARTITION BY month ORDER BY total DESC) AS rank
    FROM monthly_totals
)
SELECT * FROM ranked WHERE rank <= 10;

-- Materialized view for expensive aggregations
CREATE MATERIALIZED VIEW mv_order_stats AS
SELECT customer_id,
       COUNT(*) AS order_count,
       SUM(total_amount) AS lifetime_value,
       MAX(created_at) AS last_order_at
FROM orders
GROUP BY customer_id;

CREATE UNIQUE INDEX idx_mv_order_stats_customer 
    ON mv_order_stats (customer_id);

-- Refresh on schedule (not real-time)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_stats;

Anti-Patterns

  1. N+1 queries — Lazy loading in loops without fetch strategy
  2. **SELECT *** — Loads unnecessary columns, prevents index-only scans
  3. OFFSET pagination — Performance degrades linearly; use keyset/cursor
  4. OSIV (Open Session In View) — Lazy loading in controllers causes unpredictable queries
  5. Missing connection pool limits — Unbounded connections exhaust DB
  6. Implicit transactions — Long-running transactions hold locks

Testing Strategies

  • Query count assertions — Use datasource-proxy to assert exact SQL count per operation
  • EXPLAIN plan assertions — Verify index usage in integration tests
  • Load testing — k6/Gatling with production-like data volumes
  • Connection pool monitoring — Assert no connection leaks in test teardown
  • Slow query log — Enable log_min_duration_statement = 100 in test PostgreSQL

References