DA
Query Optimization
Database performance v1.0.0
Query Optimization
Overview
Query optimization ensures database operations execute efficiently at scale. It encompasses understanding the PostgreSQL query planner, eliminating N+1 queries through JPA fetch strategies, leveraging connection pooling with HikariCP, and using advanced SQL constructs like CTEs and materialized views. In the full-lifecycle pipeline, the @database-engineer agent generates optimized query patterns in Phase 5 that the @backend-java agent implements in Phase 6.
Key Concepts
PostgreSQL Query Planner
┌───────────────────────────────────────────────────┐
│ Query Execution Pipeline │
├───────────────────────────────────────────────────┤
│ │
│ SQL Query │
│ ↓ │
│ Parser → Query Tree │
│ ↓ │
│ Rewriter → Optimized Tree (views, rules) │
│ ↓ │
│ Planner → Multiple Plans → Cost Estimation │
│ ↓ │
│ Executor → Chosen Plan → Result Set │
│ │
│ EXPLAIN shows the plan │
│ EXPLAIN ANALYZE shows plan + actual execution │
└───────────────────────────────────────────────────┘
EXPLAIN ANALYZE Reading Guide
| Field | What It Tells You |
|---|---|
| Seq Scan | Full table scan — consider adding index |
| Index Scan | Using an index — good |
| Index Only Scan | Using covering index — best |
| Bitmap Heap Scan | Multiple index results combined — acceptable |
| Nested Loop | Row-by-row join — fast for small outer set |
| Hash Join | Hash table join — good for larger sets |
| Sort | Explicit sort — check if index can eliminate |
| actual time | Real execution time in ms (first row..last row) |
| rows | Actual vs estimated rows (big diff = stale stats) |
| Buffers: shared hit | Pages found in cache |
| Buffers: shared read | Pages read from disk |
N+1 Problem and Solutions
| Pattern | Queries | Solution |
|---|---|---|
| N+1 (bad) | 1 + N | Lazy loading triggers N extra queries |
| JOIN FETCH | 1 | @Query("SELECT o FROM Order o JOIN FETCH o.items") |
| EntityGraph | 1 | @EntityGraph(attributePaths = {"items"}) |
| Batch fetching | 1 + ceil(N/batch) | @BatchSize(size = 25) |
| Projection | 1 | interface OrderSummary { String getStatus(); } |
Best Practices
- Always use EXPLAIN ANALYZE — Never guess; measure actual query performance
- Fix N+1 queries first — Biggest performance win in JPA applications
- Use projections for read queries — Don’t load entire entities when you need 3 fields
- Configure HikariCP properly —
maximumPoolSize = (2 * CPU cores) + disk spindles - Use pagination — Never
SELECT *without LIMIT; use keyset pagination for large datasets - Run ANALYZE regularly — Stale statistics cause bad query plans
- **Avoid SELECT *** — Specify columns; reduces I/O and enables index-only scans
- Use batch operations —
saveAll()withspring.jpa.properties.hibernate.jdbc.batch_size=25
Code Examples
✅ Good: Optimized JPA Query
// Repository with optimized queries
@Repository
public interface OrderRepository extends JpaRepository<Order, Long> {
// JOIN FETCH eliminates N+1
@Query("""
SELECT o FROM Order o
JOIN FETCH o.items i
JOIN FETCH o.customer c
WHERE o.status = :status
AND o.createdAt > :since
ORDER BY o.createdAt DESC
""")
List<Order> findRecentByStatus(
@Param("status") OrderStatus status,
@Param("since") Instant since
);
// Projection for read-only dashboard (no entity overhead)
@Query("""
SELECT o.id as id, o.status as status,
o.totalAmount as totalAmount,
c.displayName as customerName
FROM Order o JOIN o.customer c
WHERE o.createdAt BETWEEN :start AND :end
""")
Page<OrderSummary> findOrderSummaries(
@Param("start") Instant start,
@Param("end") Instant end,
Pageable pageable
);
// Keyset pagination (cursor-based, no OFFSET)
@Query("""
SELECT o FROM Order o
WHERE o.createdAt < :cursor
ORDER BY o.createdAt DESC
""")
List<Order> findNextPage(
@Param("cursor") Instant cursor,
Pageable pageable
);
}
✅ Good: HikariCP Configuration
# application.yml
spring:
datasource:
hikari:
maximum-pool-size: 10 # (2 * CPU) + spindles
minimum-idle: 5
idle-timeout: 300000 # 5 minutes
max-lifetime: 1800000 # 30 minutes
connection-timeout: 30000 # 30 seconds
leak-detection-threshold: 60000 # 1 minute
jpa:
properties:
hibernate:
jdbc:
batch_size: 25 # Batch inserts/updates
fetch_size: 50
order_inserts: true # Group inserts by entity
order_updates: true
default_batch_fetch_size: 25 # Batch lazy loading
open-in-view: false # CRITICAL: disable OSIV
❌ Bad: Common Query Mistakes
// N+1: Loading orders then accessing items in loop
List<Order> orders = orderRepository.findAll();
for (Order order : orders) {
// Each call triggers a SELECT on order_items
int itemCount = order.getItems().size();
}
// Loading full entities for a count
long count = orderRepository.findAll().stream().count();
// Should be: orderRepository.count();
// OFFSET pagination on large dataset (scans and discards rows)
Page<Order> page = orderRepository.findAll(PageRequest.of(1000, 20));
// Keyset/cursor pagination is O(1), OFFSET is O(n)
✅ Good: Advanced SQL Patterns
-- CTE for readable complex queries
WITH monthly_totals AS (
SELECT customer_id,
DATE_TRUNC('month', created_at) AS month,
SUM(total_amount) AS total
FROM orders
WHERE created_at >= NOW() - INTERVAL '12 months'
GROUP BY customer_id, DATE_TRUNC('month', created_at)
),
ranked AS (
SELECT customer_id, month, total,
RANK() OVER (PARTITION BY month ORDER BY total DESC) AS rank
FROM monthly_totals
)
SELECT * FROM ranked WHERE rank <= 10;
-- Materialized view for expensive aggregations
CREATE MATERIALIZED VIEW mv_order_stats AS
SELECT customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS lifetime_value,
MAX(created_at) AS last_order_at
FROM orders
GROUP BY customer_id;
CREATE UNIQUE INDEX idx_mv_order_stats_customer
ON mv_order_stats (customer_id);
-- Refresh on schedule (not real-time)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_order_stats;
Anti-Patterns
- N+1 queries — Lazy loading in loops without fetch strategy
- **SELECT *** — Loads unnecessary columns, prevents index-only scans
- OFFSET pagination — Performance degrades linearly; use keyset/cursor
- OSIV (Open Session In View) — Lazy loading in controllers causes unpredictable queries
- Missing connection pool limits — Unbounded connections exhaust DB
- Implicit transactions — Long-running transactions hold locks
Testing Strategies
- Query count assertions — Use datasource-proxy to assert exact SQL count per operation
- EXPLAIN plan assertions — Verify index usage in integration tests
- Load testing — k6/Gatling with production-like data volumes
- Connection pool monitoring — Assert no connection leaks in test teardown
- Slow query log — Enable
log_min_duration_statement = 100in test PostgreSQL