πŸ“„

Normalization Guide

Beginner 5 min read 1000 words

Database Normalization Guide

Introduction

Database normalization is the process of organizing data to reduce redundancy and improve data integrity. Understanding normal forms is essential for designing efficient, maintainable database schemas.


Table of Contents


Why Normalize?

Problems with Unnormalized Data

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Orders (Unnormalized)                                                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ OrderId  β”‚ Customer   β”‚ CustomerAddress β”‚ Products  β”‚ ProductPrices    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1        β”‚ John Doe   β”‚ 123 Main St     β”‚ A, B, C   β”‚ 10, 20, 30       β”‚
β”‚ 2        β”‚ John Doe   β”‚ 123 Main St     β”‚ B, D      β”‚ 20, 40           β”‚
β”‚ 3        β”‚ Jane Smith β”‚ 456 Oak Ave     β”‚ A         β”‚ 10               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Problems:

  1. Update Anomaly: Changing John’s address requires updating multiple rows
  2. Insert Anomaly: Can’t add a new customer without an order
  3. Delete Anomaly: Deleting Jane’s only order loses her information
  4. Data Redundancy: Customer info repeated for each order
  5. Complex Queries: Parsing comma-separated values

Benefits of Normalization

Benefit Description
Reduced Redundancy Data stored once, referenced by keys
Data Integrity Changes in one place, no inconsistencies
Easier Maintenance Simpler updates, inserts, deletes
Flexible Queries Proper relationships enable JOINs
Storage Efficiency Less duplicate data

First Normal Form (1NF)

Rules

  1. Each column contains only atomic (indivisible) values
  2. Each column contains values of a single type
  3. Each row is unique (has a primary key)
  4. No repeating groups or arrays

Violation Example

-- ❌ NOT in 1NF: Multiple values in single column
CREATE TABLE Orders (
    OrderId INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    Products VARCHAR(500)  -- "Laptop, Mouse, Keyboard"
);

INSERT INTO Orders VALUES (1, 'John', 'Laptop, Mouse, Keyboard');

1NF Solution

-- βœ… 1NF: Atomic values, separate rows
CREATE TABLE Orders (
    OrderId INT,
    CustomerName VARCHAR(100),
    ProductName VARCHAR(100),
    PRIMARY KEY (OrderId, ProductName)
);

INSERT INTO Orders VALUES (1, 'John', 'Laptop');
INSERT INTO Orders VALUES (1, 'John', 'Mouse');
INSERT INTO Orders VALUES (1, 'John', 'Keyboard');

Visual Transformation

BEFORE (Not 1NF):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OrderId β”‚ CustomerName β”‚ Products                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1       β”‚ John         β”‚ Laptop, Mouse, Keyboard β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

AFTER (1NF):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OrderId β”‚ CustomerName β”‚ ProductName β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1       β”‚ John         β”‚ Laptop      β”‚
β”‚ 1       β”‚ John         β”‚ Mouse       β”‚
β”‚ 1       β”‚ John         β”‚ Keyboard    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Second Normal Form (2NF)

Rules

  1. Must be in 1NF
  2. All non-key columns must depend on the entire primary key
  3. No partial dependencies (applies to composite keys)

Violation Example

-- ❌ NOT in 2NF: ProductPrice depends only on ProductName, not full key
CREATE TABLE OrderItems (
    OrderId INT,
    ProductName VARCHAR(100),
    Quantity INT,
    ProductPrice DECIMAL(10,2),  -- Depends only on ProductName!
    PRIMARY KEY (OrderId, ProductName)
);

Partial Dependency: ProductPrice depends on ProductName alone, not on (OrderId, ProductName)

2NF Solution

-- βœ… 2NF: Separate tables for different dependencies

-- Products table (ProductPrice depends on ProductName)
CREATE TABLE Products (
    ProductId INT PRIMARY KEY,
    ProductName VARCHAR(100),
    ProductPrice DECIMAL(10,2)
);

-- OrderItems table (Quantity depends on full key)
CREATE TABLE OrderItems (
    OrderId INT,
    ProductId INT,
    Quantity INT,
    PRIMARY KEY (OrderId, ProductId),
    FOREIGN KEY (ProductId) REFERENCES Products(ProductId)
);

Visual Transformation

BEFORE (1NF, Not 2NF):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OrderId β”‚ ProductName β”‚ Quantity β”‚ ProductPrice β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1       β”‚ Laptop      β”‚ 2        β”‚ 1000         β”‚ ← Price depends only on Product
β”‚ 1       β”‚ Mouse       β”‚ 5        β”‚ 25           β”‚
β”‚ 2       β”‚ Laptop      β”‚ 1        β”‚ 1000         β”‚ ← Same price, redundant!
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

AFTER (2NF):
Products:                          OrderItems:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ProductId β”‚ ProductName β”‚ Price β”‚   β”‚ OrderId β”‚ ProductId β”‚ Quantity β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€   β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1         β”‚ Laptop      β”‚ 1000  β”‚   β”‚ 1       β”‚ 1         β”‚ 2        β”‚
β”‚ 2         β”‚ Mouse       β”‚ 25    β”‚   β”‚ 1       β”‚ 2         β”‚ 5        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚ 2       β”‚ 1         β”‚ 1        β”‚
                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Third Normal Form (3NF)

Rules

  1. Must be in 2NF
  2. No transitive dependencies
  3. Non-key columns must depend only on the primary key, not on other non-key columns

Violation Example

-- ❌ NOT in 3NF: CustomerCity depends on CustomerZip, not OrderId
CREATE TABLE Orders (
    OrderId INT PRIMARY KEY,
    CustomerId INT,
    CustomerName VARCHAR(100),
    CustomerZip VARCHAR(10),
    CustomerCity VARCHAR(100)  -- City depends on Zip, not OrderId!
);

Transitive Dependency: OrderId β†’ CustomerZip β†’ CustomerCity

3NF Solution

-- βœ… 3NF: Eliminate transitive dependencies

CREATE TABLE ZipCodes (
    ZipCode VARCHAR(10) PRIMARY KEY,
    City VARCHAR(100),
    State VARCHAR(50)
);

CREATE TABLE Customers (
    CustomerId INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    ZipCode VARCHAR(10),
    FOREIGN KEY (ZipCode) REFERENCES ZipCodes(ZipCode)
);

CREATE TABLE Orders (
    OrderId INT PRIMARY KEY,
    CustomerId INT,
    OrderDate DATE,
    FOREIGN KEY (CustomerId) REFERENCES Customers(CustomerId)
);

Visual Transformation

BEFORE (2NF, Not 3NF):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ OrderId β”‚ CustomerId β”‚ CustomerZip β”‚ City    β”‚ State        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1       β”‚ 100        β”‚ 10001       β”‚ New Yorkβ”‚ NY           β”‚
β”‚ 2       β”‚ 101        β”‚ 10001       β”‚ New Yorkβ”‚ NY           β”‚ ← Redundant!
β”‚ 3       β”‚ 102        β”‚ 90210       β”‚ Beverly β”‚ CA           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

AFTER (3NF):
ZipCodes:                    Customers:                   Orders:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ZipCode β”‚ City     β”‚Stateβ”‚ β”‚CustId  β”‚ Name  β”‚ ZipCode β”‚ β”‚ OrderId β”‚ CustId β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 10001   β”‚ New York β”‚ NY  β”‚ β”‚ 100    β”‚ John  β”‚ 10001   β”‚ β”‚ 1       β”‚ 100    β”‚
β”‚ 90210   β”‚ Beverly  β”‚ CA  β”‚ β”‚ 101    β”‚ Jane  β”‚ 10001   β”‚ β”‚ 2       β”‚ 101    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜ β”‚ 102    β”‚ Bob   β”‚ 90210   β”‚ β”‚ 3       β”‚ 102    β”‚
                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Boyce-Codd Normal Form (BCNF)

Rules

  1. Must be in 3NF
  2. For every functional dependency X β†’ Y, X must be a superkey
  3. Every determinant must be a candidate key

When BCNF Differs from 3NF

BCNF is stricter than 3NF. The difference appears when:

  • A table has multiple overlapping candidate keys
  • A non-prime attribute determines part of a candidate key

Violation Example

-- Student course enrollment with instructor assignment
-- Constraints:
--   Each student takes each course from one instructor
--   Each instructor teaches only one course

CREATE TABLE Enrollments (
    StudentId INT,
    Course VARCHAR(50),
    Instructor VARCHAR(50),
    PRIMARY KEY (StudentId, Course)
);

-- Functional dependencies:
-- (StudentId, Course) β†’ Instructor  βœ“ Key determines non-key
-- Instructor β†’ Course               βœ— Non-key determines part of key!

BCNF Solution

-- βœ… BCNF: Split to ensure determinants are superkeys

CREATE TABLE Instructors (
    InstructorId INT PRIMARY KEY,
    InstructorName VARCHAR(50),
    Course VARCHAR(50)  -- Instructor determines Course
);

CREATE TABLE Enrollments (
    StudentId INT,
    InstructorId INT,
    PRIMARY KEY (StudentId, InstructorId),
    FOREIGN KEY (InstructorId) REFERENCES Instructors(InstructorId)
);

Fourth Normal Form (4NF)

Rules

  1. Must be in BCNF
  2. No multi-valued dependencies (unless dependent on a superkey)
  3. Independent multi-valued facts should be stored separately

Violation Example

-- ❌ NOT in 4NF: Independent multi-valued facts in one table
-- Employee can have multiple skills AND multiple languages (independent)

CREATE TABLE EmployeeSkillsLanguages (
    EmployeeId INT,
    Skill VARCHAR(50),
    Language VARCHAR(50),
    PRIMARY KEY (EmployeeId, Skill, Language)
);

-- If John knows 3 skills and 2 languages, we need 3 Γ— 2 = 6 rows!
INSERT INTO EmployeeSkillsLanguages VALUES
(1, 'C#', 'English'),
(1, 'C#', 'Spanish'),
(1, 'SQL', 'English'),
(1, 'SQL', 'Spanish'),
(1, 'Azure', 'English'),
(1, 'Azure', 'Spanish');

4NF Solution

-- βœ… 4NF: Separate independent multi-valued dependencies

CREATE TABLE EmployeeSkills (
    EmployeeId INT,
    Skill VARCHAR(50),
    PRIMARY KEY (EmployeeId, Skill)
);

CREATE TABLE EmployeeLanguages (
    EmployeeId INT,
    Language VARCHAR(50),
    PRIMARY KEY (EmployeeId, Language)
);

-- Now: 3 skills + 2 languages = 5 rows instead of 6
INSERT INTO EmployeeSkills VALUES (1, 'C#'), (1, 'SQL'), (1, 'Azure');
INSERT INTO EmployeeLanguages VALUES (1, 'English'), (1, 'Spanish');

Fifth Normal Form (5NF)

Rules

  1. Must be in 4NF
  2. Cannot be decomposed further without losing information
  3. No join dependencies that aren’t implied by candidate keys

When to Consider 5NF

5NF addresses complex scenarios with three or more related entities where the relationship between any two depends on the third.

-- Agents sell Products in certain Regions
-- An agent might sell a product in one region but not another
-- A region might have a product through one agent but not another

-- βœ… 5NF: Decompose into three binary relationships if they're independent
CREATE TABLE AgentProducts (AgentId, ProductId);
CREATE TABLE AgentRegions (AgentId, RegionId);
CREATE TABLE ProductRegions (ProductId, RegionId);

-- ❌ But if the ternary relationship is NOT decomposable (agent-product-region
--    combinations are specific), keep as single table:
CREATE TABLE AgentProductRegions (
    AgentId INT,
    ProductId INT,
    RegionId INT,
    PRIMARY KEY (AgentId, ProductId, RegionId)
);

Denormalization

When to Denormalize

Denormalization intentionally introduces redundancy for performance reasons.

Scenario Denormalization Strategy
Frequent JOINs Store calculated/derived columns
Read-heavy workloads Duplicate data for faster reads
Reporting/Analytics Create summary tables
Performance-critical queries Materialized views

Common Denormalization Techniques

-- 1. Store calculated values
ALTER TABLE Orders ADD TotalAmount DECIMAL(10,2);
-- Update via trigger or application logic

-- 2. Store redundant data
ALTER TABLE OrderItems ADD ProductName VARCHAR(100);
-- Avoids JOIN to Products table for display

-- 3. Summary tables
CREATE TABLE DailySalesSummary (
    Date DATE PRIMARY KEY,
    TotalOrders INT,
    TotalRevenue DECIMAL(10,2)
);
-- Updated by scheduled job

-- 4. Materialized views (SQL Server indexed views)
CREATE VIEW vw_OrderSummary WITH SCHEMABINDING AS
SELECT
    o.CustomerId,
    COUNT_BIG(*) AS OrderCount,
    SUM(o.TotalAmount) AS TotalSpent
FROM dbo.Orders o
GROUP BY o.CustomerId;

CREATE UNIQUE CLUSTERED INDEX IX_OrderSummary
ON vw_OrderSummary(CustomerId);

Trade-offs

Normalized Denormalized
βœ… Data integrity ❌ Risk of inconsistency
βœ… Less storage ❌ More storage
βœ… Easier updates ❌ Update multiple places
❌ More JOINs βœ… Fewer JOINs
❌ Slower reads βœ… Faster reads

Practical Guidelines

Normal Form Recommendations

Application Type Recommended Level Reasoning
OLTP (Transactional) 3NF / BCNF Data integrity crucial
OLAP (Analytical) 2NF / Denormalized Read performance crucial
Mixed Workload 3NF + Selective Denorm Balance both needs

Normalization Decision Process

1. Start in 1NF
   └── Eliminate repeating groups and multi-valued columns

2. Check for 2NF
   └── Do you have composite keys with partial dependencies?
       └── Yes β†’ Split into separate tables

3. Check for 3NF
   └── Do non-key columns depend on other non-key columns?
       └── Yes β†’ Create lookup/reference tables

4. Consider BCNF
   └── Do you have overlapping candidate keys?
       └── Yes β†’ Further decomposition may be needed

5. Evaluate Denormalization
   └── Are JOINs causing performance issues?
       └── Yes β†’ Strategically add redundancy

Quick Reference

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    NORMAL FORMS SUMMARY                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   1NF   β”‚ Atomic values, no repeating groups                β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   2NF   β”‚ 1NF + No partial dependencies on composite key    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   3NF   β”‚ 2NF + No transitive dependencies                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  BCNF   β”‚ 3NF + Every determinant is a candidate key        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   4NF   β”‚ BCNF + No multi-valued dependencies               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   5NF   β”‚ 4NF + No join dependencies                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Interview Questions

1. What is database normalization and why is it important?

Answer: Normalization is the process of organizing database tables to minimize data redundancy and dependency. It’s important because it:

  • Prevents update anomalies (changing data in one place)
  • Ensures data consistency
  • Reduces storage waste
  • Makes maintenance easier
  • Enforces data integrity through proper relationships

2. Explain the first three normal forms (1NF, 2NF, 3NF).

Answer:

  • 1NF: Each column contains atomic (indivisible) values. No repeating groups or arrays. Each row is unique.
  • 2NF: Must be in 1NF, plus no partial dependencies. All non-key columns must depend on the entire primary key (relevant for composite keys).
  • 3NF: Must be in 2NF, plus no transitive dependencies. Non-key columns cannot depend on other non-key columnsβ€”they must depend only on the primary key.

3. What is the difference between 3NF and BCNF?

Answer: BCNF is stricter than 3NF. In BCNF, every determinant must be a candidate key. 3NF allows a non-key attribute to determine part of a composite key, while BCNF does not.

The difference only matters when you have:

  • Multiple candidate keys
  • Overlapping composite keys
  • A non-prime attribute determining part of a key

4. When would you choose to denormalize?

Answer: Denormalize when:

  • Read performance is critical and JOINs are expensive
  • Reporting queries need pre-aggregated data
  • The data is relatively static (fewer update anomaly concerns)
  • You have a read-heavy workload (OLAP, data warehouses)

Common techniques: calculated columns, summary tables, redundant foreign key attributes, materialized views.


5. What are update, insert, and delete anomalies?

Answer:

  • Update Anomaly: Same data stored in multiple places requires updating all copies. Missing one creates inconsistency.
  • Insert Anomaly: Cannot insert data without other unrelated data. Example: Can’t add a customer without an order.
  • Delete Anomaly: Deleting data unintentionally removes other needed data. Example: Deleting the last order loses customer info.

Normalization eliminates these by ensuring each fact is stored once.


6. What is a transitive dependency?

Answer: A transitive dependency occurs when a non-key column depends on another non-key column, which in turn depends on the primary key.

Example: In an Orders table with CustomerId (PK), CustomerZip, and CustomerCity:

  • CustomerId β†’ CustomerZip (OK: key β†’ non-key)
  • CustomerZip β†’ CustomerCity (Transitive: non-key β†’ non-key)
  • Therefore: CustomerId β†’ CustomerZip β†’ CustomerCity

To achieve 3NF, extract ZipCode and City into a separate table.


Sources