Schema mistakes echo for years. A denormalized table chosen for convenience becomes a bottleneck. A missing index slows queries by 100x. Your schema is your foundation. Build it carefully.
Normalization vs Denormalization
Start normalized. Only denormalize after measuring and confirming performance problems:
-- Normalized (start here)
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id),
title VARCHAR(256) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Only denormalize if measurement shows a problem
-- For example: SELECT users.*, COUNT(posts.id) FROM users...
-- becomes slow. Then add post_count to users table.
-- But maintain it with a trigger to prevent inconsistency.Indexing Strategy
Index what you query, not everything. Every index has a cost: slower writes, more storage. Index deliberately:
-- Primary key index (automatic)
CREATE TABLE users (
id SERIAL PRIMARY KEY
);
-- Unique constraint index
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Foreign key index (frequent joins)
CREATE INDEX idx_posts_user_id ON posts(user_id);
-- Composite index for common queries
-- "SELECT * FROM orders WHERE user_id = ? AND created_at > ?"
CREATE INDEX idx_orders_user_date
ON orders(user_id, created_at DESC);
-- Measure before and after
EXPLAIN ANALYZE SELECT * FROM posts WHERE user_id = 1;Finding Slow Queries
Enable query logging. Find problems before production:
-- PostgreSQL: Enable slow query log
ALTER SYSTEM SET log_min_duration_statement = 1000; -- 1 second
-- Then restart
SELECT pg_reload_conf();
-- Query the log
SELECT * FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;Common Query Patterns
Optimize for your access patterns. Different patterns need different structures:
-- Pattern 1: Frequent point lookups
-- Solution: Primary key + unique constraints
SELECT * FROM users WHERE id = 1;
-- Pattern 2: Range queries
-- Solution: B-tree index on range column
SELECT * FROM posts WHERE created_at > NOW() - INTERVAL '7 days';
-- Pattern 3: Text search
-- Solution: GIN index for full-text search
CREATE INDEX idx_posts_search ON posts USING GIN(
to_tsvector('english', title || ' ' || content)
);
-- Pattern 4: Aggregations
-- Solution: Materialized views or denormalization
CREATE MATERIALIZED VIEW user_stats AS
SELECT user_id, COUNT(*) as post_count, MAX(created_at) as last_post
FROM posts
GROUP BY user_id;Real-World Incident
A system I worked on had a slow reporting feature. Every query took 2+ seconds. Investigation revealed the schema had timestamps stored incorrectly, queries couldn't use indexes, and all 10 million rows were scanned. The fix:
- Fix timestamp type: Use TIMESTAMP, not TEXT
- Add index: Index the timestamp column used in WHERE clause
- Query rewrite: Use date ranges instead of string matching
Result: Query dropped from 2 seconds to 50ms. Index a single column. 2000% improvement. This is why schema design matters.