📊

Indexes Guide

Databases & SQL Intermediate 4 min read 700 words

SQL Server Indexes Guide

Introduction

Indexes are database structures that improve query performance by enabling fast data retrieval. Understanding index types, when to create them, and how to maintain them is crucial for database performance.


Table of Contents


Index Fundamentals

What is an Index?

An index is a sorted data structure (typically B-tree) that allows the database engine to find rows quickly without scanning the entire table.

                    ┌───────────────────┐
                    │   Root Page       │
                    │   [M]             │
                    └─────────┬─────────┘
                              │
            ┌─────────────────┼─────────────────┐
            │                 │                 │
     ┌──────▼──────┐   ┌──────▼──────┐   ┌──────▼──────┐
     │  A - L      │   │  M - R      │   │  S - Z      │
     │ [D] [H]     │   │ [O] [P]     │   │ [T] [W]     │
     └──────┬──────┘   └──────┬──────┘   └──────┬──────┘
            │                 │                 │
     ┌──────▼──────┐   ┌──────▼──────┐   ┌──────▼──────┐
     │ Leaf Pages  │   │ Leaf Pages  │   │ Leaf Pages  │
     │ (Data Rows) │   │ (Data Rows) │   │ (Data Rows) │
     └─────────────┘   └─────────────┘   └─────────────┘

Index Types Overview

Index Type Description Use Case
Clustered Defines physical row order Primary key, range queries
Non-Clustered Separate structure pointing to data Secondary lookups
Covering Includes all query columns Avoid key lookups
Filtered Index on subset of rows Specific value queries
Columnstore Column-oriented storage Analytics, aggregations
Full-Text Text search index Content searching
Spatial Geographic data Location queries

Heap vs Clustered Table

-- Heap: Table without clustered index
CREATE TABLE HeapTable (
    Id INT,
    Name VARCHAR(100),
    Data VARCHAR(MAX)
);
-- Rows stored in any available space (no order)
-- Scans required for most queries

-- Clustered: Table with clustered index
CREATE TABLE ClusteredTable (
    Id INT PRIMARY KEY,  -- Creates clustered index by default
    Name VARCHAR(100),
    Data VARCHAR(MAX)
);
-- Rows stored in primary key order
-- Enables efficient range queries

Clustered Indexes

Characteristics

  • Defines the physical order of data in the table
  • Only one per table (rows can only be in one order)
  • Leaf level contains actual data rows
  • Automatically created for PRIMARY KEY (unless specified otherwise)
  • All non-clustered indexes reference the clustered index key

Creating Clustered Indexes

-- Created automatically with PRIMARY KEY
CREATE TABLE Orders (
    OrderId INT PRIMARY KEY,  -- Clustered by default
    CustomerId INT,
    OrderDate DATETIME
);

-- Explicit clustered index
CREATE TABLE Products (
    ProductId INT,
    Name VARCHAR(100),
    CategoryId INT
);
CREATE CLUSTERED INDEX CIX_Products_ProductId ON Products(ProductId);

-- Composite clustered index
CREATE CLUSTERED INDEX CIX_OrderDetails
ON OrderDetails(OrderId, LineNumber);

Clustered Index Key Selection

-- ✅ Good clustered index keys:
-- 1. Narrow (small data type)
-- 2. Unique (avoids uniquifier overhead)
-- 3. Static (unchanging values)
-- 4. Ever-increasing (avoids page splits)

-- ✅ BEST: Identity column
CREATE TABLE Orders (
    OrderId INT IDENTITY(1,1) PRIMARY KEY,  -- Narrow, unique, static, increasing
    CustomerId INT,
    OrderDate DATETIME
);

-- ✅ GOOD: Sequential GUID (newsequentialid)
CREATE TABLE Documents (
    DocumentId UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
    Content NVARCHAR(MAX)
);

-- ❌ BAD: Standard GUID (random, causes fragmentation)
CREATE TABLE BadDesign (
    Id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,  -- Random = page splits
    Data NVARCHAR(MAX)
);

-- ❌ BAD: Wide composite key
CREATE TABLE BadDesign2 (
    CustomerId INT,
    OrderDate DATETIME,
    ProductId INT,
    PRIMARY KEY (CustomerId, OrderDate, ProductId)  -- 16 bytes, replicated everywhere
);

Clustered Index Impact

-- All non-clustered indexes include the clustered key
-- Example: If clustered key is 16 bytes vs 4 bytes:

-- Table with 1 million rows, 5 non-clustered indexes
-- Clustered key size × rows × number of NC indexes
-- 16 bytes: 16 × 1,000,000 × 5 = 80 MB overhead
-- 4 bytes:  4 × 1,000,000 × 5 = 20 MB overhead

Non-Clustered Indexes

Characteristics

  • Separate B-tree structure from the data
  • Leaf level contains index key + pointer to data (clustered key or RID)
  • Can have up to 999 non-clustered indexes per table
  • May require key lookup for columns not in index

Creating Non-Clustered Indexes

-- Basic non-clustered index
CREATE INDEX IX_Orders_CustomerId ON Orders(CustomerId);

-- Unique non-clustered index
CREATE UNIQUE INDEX UX_Employees_Email ON Employees(Email);

-- Composite index (column order matters!)
CREATE INDEX IX_Orders_CustomerDate
ON Orders(CustomerId, OrderDate);
-- Efficient for: WHERE CustomerId = 1
-- Efficient for: WHERE CustomerId = 1 AND OrderDate > '2024-01-01'
-- NOT efficient for: WHERE OrderDate > '2024-01-01' (alone)

-- Descending column
CREATE INDEX IX_Orders_DateDesc
ON Orders(OrderDate DESC);

Key Lookup Problem

-- Query needs columns not in index
SELECT OrderId, CustomerId, OrderDate, ShippingAddress
FROM Orders
WHERE CustomerId = 123;

-- With this index:
CREATE INDEX IX_Orders_CustomerId ON Orders(CustomerId);

-- Execution:
-- 1. Index Seek on IX_Orders_CustomerId (find matching rows)
-- 2. Key Lookup on clustered index (get ShippingAddress for each row)

-- Key lookups are expensive with many rows!

Covering Indexes

Concept

A covering index includes all columns needed by a query, eliminating the need for key lookups.

-- Query to optimize:
SELECT CustomerId, OrderDate, Total
FROM Orders
WHERE CustomerId = 123;

-- ❌ Non-covering index (requires key lookup)
CREATE INDEX IX_Orders_CustomerId ON Orders(CustomerId);

-- ✅ Covering index using INCLUDE
CREATE INDEX IX_Orders_CustomerId_Covering
ON Orders(CustomerId)
INCLUDE (OrderDate, Total);

INCLUDE vs Key Columns

-- Key columns: Used for seeking and sorting
-- Included columns: Only stored at leaf level

-- Key columns
CREATE INDEX IX_Orders_CustomerDate
ON Orders(CustomerId, OrderDate);  -- Can seek on both, sort by both

-- With INCLUDE
CREATE INDEX IX_Orders_Customer_IncludeDate
ON Orders(CustomerId)
INCLUDE (OrderDate);  -- Can only seek on CustomerId

-- When to use INCLUDE:
-- - Column needed in SELECT but not WHERE
-- - Column not useful for sorting
-- - Wide columns that would bloat the index

Covering Index Examples

-- Example 1: Specific query coverage
-- Query: SELECT Name, Email FROM Users WHERE Status = 'Active'
CREATE INDEX IX_Users_Status
ON Users(Status)
INCLUDE (Name, Email);

-- Example 2: Aggregate query coverage
-- Query: SELECT CustomerId, SUM(Total) FROM Orders GROUP BY CustomerId
CREATE INDEX IX_Orders_CustomerId_IncTotal
ON Orders(CustomerId)
INCLUDE (Total);

-- Example 3: JOIN coverage
-- Query: SELECT o.*, c.Name FROM Orders o JOIN Customers c ON o.CustomerId = c.Id
--        WHERE o.Status = 'Pending'
CREATE INDEX IX_Orders_Status
ON Orders(Status)
INCLUDE (CustomerId, OrderDate, Total);

CREATE INDEX IX_Customers_Id
ON Customers(Id)
INCLUDE (Name);

Filtered Indexes

Concept

A filtered index includes only rows that match a WHERE clause, reducing index size and maintenance.

-- Only index active orders
CREATE INDEX IX_Orders_Active
ON Orders(CustomerId, OrderDate)
WHERE Status = 'Active';

-- Only index non-null values
CREATE INDEX IX_Products_Discontinued
ON Products(DiscontinuedDate)
WHERE DiscontinuedDate IS NOT NULL;

-- Specific category index
CREATE INDEX IX_Products_Electronics
ON Products(Name, Price)
WHERE CategoryId = 5;  -- Electronics

Benefits and Limitations

-- Benefits:
-- 1. Smaller index = less storage
-- 2. Faster index maintenance
-- 3. Better query performance for filtered subset

-- Limitations:
-- 1. Query must match filter predicate
-- 2. Filter must be deterministic
-- 3. Cannot use subqueries, functions on columns

-- ❌ Won't use filtered index:
SELECT * FROM Orders WHERE Status = 'Shipped';  -- Different status

-- ✅ Will use filtered index:
SELECT * FROM Orders WHERE Status = 'Active' AND CustomerId = 123;

Filtered Index Use Cases

-- 1. Soft deletes
CREATE INDEX IX_Customers_NotDeleted
ON Customers(Email)
WHERE IsDeleted = 0;

-- 2. Sparse columns
CREATE INDEX IX_Users_PhoneVerified
ON Users(PhoneNumber)
WHERE PhoneNumber IS NOT NULL;

-- 3. Multi-tenant with hot tenants
CREATE INDEX IX_Orders_Tenant1
ON Orders(OrderDate)
WHERE TenantId = 1;  -- High-volume tenant

-- 4. Unique constraint on subset
CREATE UNIQUE INDEX IX_Employees_ActiveBadge
ON Employees(BadgeNumber)
WHERE IsActive = 1;  -- Unique only among active employees

Columnstore Indexes

Characteristics

  • Column-oriented storage (vs row-oriented)
  • Highly compressed
  • Excellent for analytics and aggregations
  • Batch mode processing (thousands of rows at once)
-- Clustered columnstore (entire table stored columnwise)
CREATE CLUSTERED COLUMNSTORE INDEX CCI_SalesHistory
ON SalesHistory;

-- Non-clustered columnstore (secondary structure)
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Orders_Analytics
ON Orders(CustomerId, OrderDate, Total, ProductId);

When to Use Columnstore

-- ✅ Good for:
-- - Data warehouses
-- - Large fact tables
-- - Aggregation queries (SUM, COUNT, AVG)
-- - Full table scans
-- - Read-heavy workloads

-- Analytics query (excellent with columnstore)
SELECT
    ProductCategory,
    YEAR(OrderDate) AS OrderYear,
    SUM(Total) AS TotalSales,
    COUNT(*) AS OrderCount
FROM SalesHistory
GROUP BY ProductCategory, YEAR(OrderDate);

-- ❌ Not ideal for:
-- - Point lookups (single row)
-- - OLTP workloads
-- - Frequent updates
-- - Small tables

Columnstore Performance

-- Comparison: 100 million rows, aggregation query
-- Rowstore: 45 seconds, 500K logical reads
-- Columnstore: 2 seconds, 10K logical reads

-- Why faster:
-- 1. Column elimination (only reads needed columns)
-- 2. Compression (10x typical)
-- 3. Batch mode (processes 900+ rows at a time)
-- 4. Segment elimination (min/max metadata)

Index Design Strategies

Query-Based Design

-- 1. Identify critical queries
-- 2. Analyze predicates (WHERE, JOIN)
-- 3. Consider columns in SELECT
-- 4. Check for ORDER BY

-- Example query:
SELECT OrderId, OrderDate, Total
FROM Orders
WHERE CustomerId = @CustomerId
  AND OrderDate >= @StartDate
  AND Status = 'Active'
ORDER BY OrderDate DESC;

-- Index design:
CREATE INDEX IX_Orders_Query1
ON Orders(CustomerId, Status, OrderDate DESC)
INCLUDE (Total)
WHERE Status = 'Active';  -- Filtered if Status is always 'Active'

Index Column Order

-- Order matters! Left-to-right matching

-- Index: (A, B, C)
-- ✅ Uses index: WHERE A = 1
-- ✅ Uses index: WHERE A = 1 AND B = 2
-- ✅ Uses index: WHERE A = 1 AND B = 2 AND C = 3
-- ❌ Cannot seek: WHERE B = 2 (A not specified)
-- ❌ Cannot seek: WHERE C = 3 (A, B not specified)

-- Best practices:
-- 1. Equality columns first
-- 2. Range/inequality columns after
-- 3. Most selective columns first (generally)

-- Example:
-- WHERE CustomerId = 1 AND OrderDate > '2024-01-01' AND Status = 'Active'
-- Best index: (CustomerId, Status, OrderDate)
-- CustomerId: equality, Status: equality, OrderDate: range

Index Width vs Performance

-- Narrower indexes are better:
-- - Less storage
-- - More rows per page
-- - Faster scans
-- - Cheaper maintenance

-- ❌ Too wide
CREATE INDEX IX_TooWide
ON Orders(CustomerId, ProductId, OrderDate, Status, Total, ShippingId, Notes);

-- ✅ Focused
CREATE INDEX IX_Focused
ON Orders(CustomerId, OrderDate)
INCLUDE (Total);

Avoiding Over-Indexing

-- Signs of over-indexing:
-- 1. Many unused indexes
-- 2. Write performance degradation
-- 3. Large storage usage
-- 4. Duplicate/overlapping indexes

-- Example: Redundant indexes
CREATE INDEX IX1 ON Orders(CustomerId);
CREATE INDEX IX2 ON Orders(CustomerId, OrderDate);  -- IX1 is redundant!
CREATE INDEX IX3 ON Orders(CustomerId) INCLUDE (Total);  -- Also redundant!

-- Keep only:
CREATE INDEX IX_Orders_CustomerDate
ON Orders(CustomerId, OrderDate)
INCLUDE (Total);

Index Maintenance

Fragmentation

-- Check fragmentation
SELECT
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    s.avg_fragmentation_in_percent,
    s.page_count,
    s.record_count
FROM sys.dm_db_index_physical_stats(
    DB_ID(), NULL, NULL, NULL, 'LIMITED') s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE s.avg_fragmentation_in_percent > 10
  AND s.page_count > 1000
ORDER BY s.avg_fragmentation_in_percent DESC;

-- Fragmentation guidelines:
-- < 10%: Do nothing
-- 10-30%: Reorganize (online, less resource intensive)
-- > 30%: Rebuild (more resource intensive, better results)

Reorganize vs Rebuild

-- Reorganize: Online, less resource intensive
-- - Defragments leaf level only
-- - Always online
-- - Does not update statistics
ALTER INDEX IX_Orders_CustomerId ON Orders REORGANIZE;

-- Rebuild: More thorough
-- - Recreates entire index
-- - Updates statistics
-- - Can be online (Enterprise) or offline
ALTER INDEX IX_Orders_CustomerId ON Orders REBUILD;

-- Online rebuild (Enterprise edition)
ALTER INDEX IX_Orders_CustomerId ON Orders REBUILD WITH (ONLINE = ON);

-- Rebuild all indexes on table
ALTER INDEX ALL ON Orders REBUILD;

Automated Maintenance Script

-- Index maintenance procedure
CREATE PROCEDURE usp_MaintainIndexes
AS
BEGIN
    DECLARE @TableName NVARCHAR(256)
    DECLARE @IndexName NVARCHAR(256)
    DECLARE @Fragmentation FLOAT
    DECLARE @SQL NVARCHAR(MAX)

    DECLARE index_cursor CURSOR FOR
    SELECT
        OBJECT_NAME(i.object_id),
        i.name,
        s.avg_fragmentation_in_percent
    FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') s
    JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
    WHERE s.page_count > 1000
      AND i.name IS NOT NULL

    OPEN index_cursor
    FETCH NEXT FROM index_cursor INTO @TableName, @IndexName, @Fragmentation

    WHILE @@FETCH_STATUS = 0
    BEGIN
        IF @Fragmentation > 30
            SET @SQL = 'ALTER INDEX ' + @IndexName + ' ON ' + @TableName + ' REBUILD'
        ELSE IF @Fragmentation > 10
            SET @SQL = 'ALTER INDEX ' + @IndexName + ' ON ' + @TableName + ' REORGANIZE'
        ELSE
            SET @SQL = NULL

        IF @SQL IS NOT NULL
        BEGIN
            PRINT @SQL
            EXEC sp_executesql @SQL
        END

        FETCH NEXT FROM index_cursor INTO @TableName, @IndexName, @Fragmentation
    END

    CLOSE index_cursor
    DEALLOCATE index_cursor
END

Statistics Maintenance

-- Update statistics for table
UPDATE STATISTICS Orders;

-- Update specific index statistics
UPDATE STATISTICS Orders IX_Orders_CustomerId;

-- Full scan (more accurate)
UPDATE STATISTICS Orders WITH FULLSCAN;

-- Auto-update settings
SELECT
    name,
    is_auto_update_stats_on,
    is_auto_update_stats_async_on
FROM sys.databases
WHERE name = DB_NAME();

Monitoring Index Usage

Unused Indexes

-- Find unused indexes (candidates for removal)
SELECT
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    i.type_desc,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates,
    s.last_user_seek,
    s.last_user_scan
FROM sys.dm_db_index_usage_stats s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE OBJECTPROPERTY(i.object_id, 'IsMsShipped') = 0
  AND s.user_seeks = 0
  AND s.user_scans = 0
  AND s.user_lookups = 0
  AND s.user_updates > 0  -- Being maintained but never read
ORDER BY s.user_updates DESC;

Most Used Indexes

-- Find heavily used indexes
SELECT
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    s.user_seeks + s.user_scans + s.user_lookups AS TotalReads,
    s.user_updates AS TotalWrites,
    CAST(s.user_seeks + s.user_scans + s.user_lookups AS FLOAT) /
        NULLIF(s.user_updates, 0) AS ReadWriteRatio
FROM sys.dm_db_index_usage_stats s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE OBJECTPROPERTY(i.object_id, 'IsMsShipped') = 0
ORDER BY TotalReads DESC;

Missing Indexes

-- SQL Server's missing index recommendations
SELECT TOP 20
    ROUND(migs.avg_total_user_cost * migs.avg_user_impact *
          (migs.user_seeks + migs.user_scans), 0) AS ImprovementScore,
    mig.index_group_handle,
    mid.statement AS TableName,
    mid.equality_columns,
    mid.inequality_columns,
    mid.included_columns,
    migs.user_seeks,
    migs.user_scans
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs
    ON mig.index_group_handle = migs.group_handle
JOIN sys.dm_db_missing_index_details mid
    ON mig.index_handle = mid.index_handle
ORDER BY ImprovementScore DESC;

-- Generate CREATE INDEX statements
SELECT
    'CREATE INDEX IX_' + OBJECT_NAME(mid.object_id) + '_' +
    REPLACE(REPLACE(REPLACE(
        ISNULL(mid.equality_columns, '') + ISNULL(mid.inequality_columns, ''),
        '[', ''), ']', ''), ', ', '_') +
    ' ON ' + mid.statement +
    ' (' + ISNULL(mid.equality_columns, '') +
    CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL
         THEN ', ' ELSE '' END +
    ISNULL(mid.inequality_columns, '') + ')' +
    CASE WHEN mid.included_columns IS NOT NULL
         THEN ' INCLUDE (' + mid.included_columns + ')' ELSE '' END
    AS CreateStatement
FROM sys.dm_db_missing_index_details mid;

Index Size Analysis

-- Index size and row counts
SELECT
    t.name AS TableName,
    i.name AS IndexName,
    i.type_desc,
    p.rows,
    SUM(a.total_pages) * 8 AS TotalSpaceKB,
    SUM(a.used_pages) * 8 AS UsedSpaceKB,
    SUM(a.data_pages) * 8 AS DataSpaceKB
FROM sys.tables t
JOIN sys.indexes i ON t.object_id = i.object_id
JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
JOIN sys.allocation_units a ON p.partition_id = a.container_id
GROUP BY t.name, i.name, i.type_desc, p.rows
ORDER BY SUM(a.total_pages) DESC;

Interview Questions

1. What is the difference between clustered and non-clustered indexes?

Answer:

  • Clustered: Defines physical row order. Only one per table. Leaf nodes contain actual data rows.
  • Non-clustered: Separate structure. Multiple per table. Leaf nodes contain index key + pointer (clustered key or RID).

Choosing clustered key wisely is crucial because all non-clustered indexes reference it.


2. What is a covering index?

Answer: A covering index includes all columns needed by a query in the index structure itself (via key columns or INCLUDE columns). This eliminates the need for key lookups to retrieve additional columns from the base table.

-- Query: SELECT Name, Email FROM Users WHERE Status = 'Active'
-- Covering index:
CREATE INDEX IX_Users_Status ON Users(Status) INCLUDE (Name, Email);

3. When would you use a filtered index?

Answer: Use filtered indexes when:

  • Queries frequently filter on specific values
  • Large portion of data is excluded (e.g., soft deletes)
  • Unique constraint applies to subset of rows
  • Sparse columns with many NULLs

Benefits: Smaller size, faster maintenance, better performance for matching queries.


4. How do you decide the column order in a composite index?

Answer:

  1. Equality columns first: Columns used with = in WHERE
  2. Range columns after: Columns used with >, <, BETWEEN
  3. Consider selectivity: More selective columns often first
  4. Match query patterns: Index supports left-to-right matching

Example: WHERE CustomerId = 1 AND Status = 'Active' AND OrderDate > '2024-01-01' Best index: (CustomerId, Status, OrderDate) - two equalities first, then range.


5. What is index fragmentation and how do you address it?

Answer: Fragmentation occurs when logical order differs from physical order (page splits during inserts/updates).

Types:

  • Internal: Page not full, wasted space
  • External: Pages out of order

Resolution:

  • < 10%: Leave alone
  • 10-30%: REORGANIZE (online, leaf-level only)
  • > 30%: REBUILD (recreates index, updates statistics)

6. What are the downsides of having too many indexes?

Answer:

  1. Write overhead: Every INSERT/UPDATE/DELETE maintains all indexes
  2. Storage cost: Indexes consume disk space
  3. Memory pressure: Indexes compete for buffer pool
  4. Optimizer confusion: More plans to evaluate
  5. Maintenance overhead: More indexes to rebuild/reorganize

Balance read performance gains against write overhead. Remove unused indexes.


Sources

📚 Related Articles