Schema Design

Overview

Database schema design transforms domain models and API contracts into relational database structures. Good schema design balances data integrity (normalization) with query performance (strategic denormalization), using appropriate data types, constraints, and relationships. In the full-lifecycle pipeline, schema design in Phase 5 takes API resource models from Phase 4 and produces the data foundation for Phase 6 backend implementation.

Key Concepts

Domain-to-Schema Mapping

┌─────────────────────────────────────────────────────────────┐
│              Domain Model → Schema Mapping                   │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Domain Entity    →   Database Table                         │
│  Value Object     →   Embedded columns or lookup table       │
│  Aggregate Root   →   Primary table with owned children      │
│  Association      →   Foreign key or join table              │
│  Enum             →   VARCHAR + CHECK or reference table     │
│  Collection       →   Child table with FK (1:N)              │
│  Bi-directional   →   FK on owning side only                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Normalization Guide

Normal Form	Rule	Example Violation
1NF	Atomic values, no repeating groups	`tags: "java,spring"` → split to join table
2NF	No partial dependencies on composite PK	Order item stores product name → separate
3NF	No transitive dependencies	Order stores customer city → use FK to customer
BCNF	Every determinant is a candidate key	Rare — apply only if 3NF has anomalies

Relationship Patterns

Relationship	Schema Pattern	JPA Annotation
1:1	FK with UNIQUE on child	`@OneToOne`
1:N	FK on the “many” side	`@OneToMany` / `@ManyToOne`
M:N	Join table with two FKs	`@ManyToMany` (or explicit join entity)
Self-referential	FK to same table	`@ManyToOne` on same entity
Polymorphic	Single table with discriminator OR table-per-type	`@Inheritance`

Best Practices

Start with 3NF — Normalize first, denormalize only when query patterns demand it
Use BIGSERIAL for internal PKs — 8 bytes, sequential, index-friendly; UUID for distributed/exposed IDs
Always include audit columns — created_at, updated_at, created_by, version
Use TIMESTAMPTZ, never TIMESTAMP — Always store timezone-aware timestamps
Use NUMERIC(19,4) for money — Never FLOAT or DOUBLE for financial data
Add CHECK constraints — Enforce business rules at database level
Prefer VARCHAR(n) over TEXT for bounded fields — Explicit limits catch bugs
Design for soft delete only when audit trail is required — Otherwise, hard delete with GDPR compliance

Code Examples

✅ Good: Production Schema

-- Audit function (reusable)
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ language 'plpgsql';

-- Users table
CREATE TABLE users (
    id              BIGSERIAL PRIMARY KEY,
    email           VARCHAR(255) NOT NULL,
    display_name    VARCHAR(100) NOT NULL,
    role            VARCHAR(50) NOT NULL DEFAULT 'CUSTOMER',
    is_active       BOOLEAN NOT NULL DEFAULT true,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    version         INTEGER NOT NULL DEFAULT 0,
    
    CONSTRAINT uq_users_email UNIQUE (email),
    CONSTRAINT chk_users_role CHECK (role IN ('CUSTOMER', 'AGENT', 'ADMIN'))
);

CREATE TRIGGER trg_users_updated_at
    BEFORE UPDATE ON users
    FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();

CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_role ON users (role) WHERE is_active = true;

❌ Bad: Common Mistakes

-- No constraints, wrong types, no audit
CREATE TABLE users (
    id INT,                    -- Should be BIGSERIAL, no PK
    email TEXT,                -- No UNIQUE, no NOT NULL
    name TEXT,                 -- No length limit
    created TIMESTAMP,         -- Missing timezone (TZ)
    balance FLOAT              -- Never FLOAT for money
);

Anti-Patterns

God Table — One table with 50+ columns (split into related tables)
EAV (Entity-Attribute-Value) — Storing everything as key-value pairs (use JSONB for flexible schema)
No Foreign Keys — “We’ll enforce it in the application” (data integrity belongs in the DB)
Implicit Enums — Using integers for status without documentation (use VARCHAR + CHECK)
Missing Indexes on FKs — Every foreign key column should have an index
Premature Denormalization — Optimizing for performance before measuring

Testing Strategies

Migration Tests — Run Flyway migrations against TestContainers PostgreSQL
Constraint Tests — Verify NOT NULL, UNIQUE, CHECK constraints with invalid data
Referential Integrity Tests — Verify FK cascades and restrictions
Schema Drift Detection — Compare schema in DB vs. migration scripts

Schema Design

Schema Design

Overview

Key Concepts

Domain-to-Schema Mapping

Normalization Guide

Relationship Patterns

Best Practices

Code Examples

✅ Good: Production Schema

❌ Bad: Common Mistakes

Anti-Patterns

Testing Strategies

References

Related Skills