DA
Schema Design
Database core v1.0.0
Schema Design
Overview
Database schema design transforms domain models and API contracts into relational database structures. Good schema design balances data integrity (normalization) with query performance (strategic denormalization), using appropriate data types, constraints, and relationships. In the full-lifecycle pipeline, schema design in Phase 5 takes API resource models from Phase 4 and produces the data foundation for Phase 6 backend implementation.
Key Concepts
Domain-to-Schema Mapping
┌─────────────────────────────────────────────────────────────┐
│ Domain Model → Schema Mapping │
├─────────────────────────────────────────────────────────────┤
│ │
│ Domain Entity → Database Table │
│ Value Object → Embedded columns or lookup table │
│ Aggregate Root → Primary table with owned children │
│ Association → Foreign key or join table │
│ Enum → VARCHAR + CHECK or reference table │
│ Collection → Child table with FK (1:N) │
│ Bi-directional → FK on owning side only │
│ │
└─────────────────────────────────────────────────────────────┘
Normalization Guide
| Normal Form | Rule | Example Violation |
|---|---|---|
| 1NF | Atomic values, no repeating groups | tags: "java,spring" → split to join table |
| 2NF | No partial dependencies on composite PK | Order item stores product name → separate |
| 3NF | No transitive dependencies | Order stores customer city → use FK to customer |
| BCNF | Every determinant is a candidate key | Rare — apply only if 3NF has anomalies |
Relationship Patterns
| Relationship | Schema Pattern | JPA Annotation |
|---|---|---|
| 1:1 | FK with UNIQUE on child | @OneToOne |
| 1:N | FK on the “many” side | @OneToMany / @ManyToOne |
| M:N | Join table with two FKs | @ManyToMany (or explicit join entity) |
| Self-referential | FK to same table | @ManyToOne on same entity |
| Polymorphic | Single table with discriminator OR table-per-type | @Inheritance |
Best Practices
- Start with 3NF — Normalize first, denormalize only when query patterns demand it
- Use BIGSERIAL for internal PKs — 8 bytes, sequential, index-friendly; UUID for distributed/exposed IDs
- Always include audit columns —
created_at,updated_at,created_by,version - Use TIMESTAMPTZ, never TIMESTAMP — Always store timezone-aware timestamps
- Use NUMERIC(19,4) for money — Never FLOAT or DOUBLE for financial data
- Add CHECK constraints — Enforce business rules at database level
- Prefer VARCHAR(n) over TEXT for bounded fields — Explicit limits catch bugs
- Design for soft delete only when audit trail is required — Otherwise, hard delete with GDPR compliance
Code Examples
✅ Good: Production Schema
-- Audit function (reusable)
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ language 'plpgsql';
-- Users table
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL,
display_name VARCHAR(100) NOT NULL,
role VARCHAR(50) NOT NULL DEFAULT 'CUSTOMER',
is_active BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
version INTEGER NOT NULL DEFAULT 0,
CONSTRAINT uq_users_email UNIQUE (email),
CONSTRAINT chk_users_role CHECK (role IN ('CUSTOMER', 'AGENT', 'ADMIN'))
);
CREATE TRIGGER trg_users_updated_at
BEFORE UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION update_updated_at_column();
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_role ON users (role) WHERE is_active = true;
❌ Bad: Common Mistakes
-- No constraints, wrong types, no audit
CREATE TABLE users (
id INT, -- Should be BIGSERIAL, no PK
email TEXT, -- No UNIQUE, no NOT NULL
name TEXT, -- No length limit
created TIMESTAMP, -- Missing timezone (TZ)
balance FLOAT -- Never FLOAT for money
);
Anti-Patterns
- God Table — One table with 50+ columns (split into related tables)
- EAV (Entity-Attribute-Value) — Storing everything as key-value pairs (use JSONB for flexible schema)
- No Foreign Keys — “We’ll enforce it in the application” (data integrity belongs in the DB)
- Implicit Enums — Using integers for status without documentation (use VARCHAR + CHECK)
- Missing Indexes on FKs — Every foreign key column should have an index
- Premature Denormalization — Optimizing for performance before measuring
Testing Strategies
- Migration Tests — Run Flyway migrations against TestContainers PostgreSQL
- Constraint Tests — Verify NOT NULL, UNIQUE, CHECK constraints with invalid data
- Referential Integrity Tests — Verify FK cascades and restrictions
- Schema Drift Detection — Compare schema in DB vs. migration scripts