SQL Server Indexes Guide
Introduction
Indexes are database structures that improve query performance by enabling fast data retrieval. Understanding index types, when to create them, and how to maintain them is crucial for database performance.
Table of Contents
- Index Fundamentals
- Clustered Indexes
- Non-Clustered Indexes
- Covering Indexes
- Filtered Indexes
- Columnstore Indexes
- Index Design Strategies
- Index Maintenance
- Monitoring Index Usage
- Interview Questions
Index Fundamentals
What is an Index?
An index is a sorted data structure (typically B-tree) that allows the database engine to find rows quickly without scanning the entire table.
┌───────────────────┐
│ Root Page │
│ [M] │
└─────────┬─────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ A - L │ │ M - R │ │ S - Z │
│ [D] [H] │ │ [O] [P] │ │ [T] [W] │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Leaf Pages │ │ Leaf Pages │ │ Leaf Pages │
│ (Data Rows) │ │ (Data Rows) │ │ (Data Rows) │
└─────────────┘ └─────────────┘ └─────────────┘
Index Types Overview
| Index Type | Description | Use Case |
|---|---|---|
| Clustered | Defines physical row order | Primary key, range queries |
| Non-Clustered | Separate structure pointing to data | Secondary lookups |
| Covering | Includes all query columns | Avoid key lookups |
| Filtered | Index on subset of rows | Specific value queries |
| Columnstore | Column-oriented storage | Analytics, aggregations |
| Full-Text | Text search index | Content searching |
| Spatial | Geographic data | Location queries |
Heap vs Clustered Table
-- Heap: Table without clustered index
CREATE TABLE HeapTable (
Id INT,
Name VARCHAR(100),
Data VARCHAR(MAX)
);
-- Rows stored in any available space (no order)
-- Scans required for most queries
-- Clustered: Table with clustered index
CREATE TABLE ClusteredTable (
Id INT PRIMARY KEY, -- Creates clustered index by default
Name VARCHAR(100),
Data VARCHAR(MAX)
);
-- Rows stored in primary key order
-- Enables efficient range queries
Clustered Indexes
Characteristics
- Defines the physical order of data in the table
- Only one per table (rows can only be in one order)
- Leaf level contains actual data rows
- Automatically created for PRIMARY KEY (unless specified otherwise)
- All non-clustered indexes reference the clustered index key
Creating Clustered Indexes
-- Created automatically with PRIMARY KEY
CREATE TABLE Orders (
OrderId INT PRIMARY KEY, -- Clustered by default
CustomerId INT,
OrderDate DATETIME
);
-- Explicit clustered index
CREATE TABLE Products (
ProductId INT,
Name VARCHAR(100),
CategoryId INT
);
CREATE CLUSTERED INDEX CIX_Products_ProductId ON Products(ProductId);
-- Composite clustered index
CREATE CLUSTERED INDEX CIX_OrderDetails
ON OrderDetails(OrderId, LineNumber);
Clustered Index Key Selection
-- ✅ Good clustered index keys:
-- 1. Narrow (small data type)
-- 2. Unique (avoids uniquifier overhead)
-- 3. Static (unchanging values)
-- 4. Ever-increasing (avoids page splits)
-- ✅ BEST: Identity column
CREATE TABLE Orders (
OrderId INT IDENTITY(1,1) PRIMARY KEY, -- Narrow, unique, static, increasing
CustomerId INT,
OrderDate DATETIME
);
-- ✅ GOOD: Sequential GUID (newsequentialid)
CREATE TABLE Documents (
DocumentId UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
Content NVARCHAR(MAX)
);
-- ❌ BAD: Standard GUID (random, causes fragmentation)
CREATE TABLE BadDesign (
Id UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY, -- Random = page splits
Data NVARCHAR(MAX)
);
-- ❌ BAD: Wide composite key
CREATE TABLE BadDesign2 (
CustomerId INT,
OrderDate DATETIME,
ProductId INT,
PRIMARY KEY (CustomerId, OrderDate, ProductId) -- 16 bytes, replicated everywhere
);
Clustered Index Impact
-- All non-clustered indexes include the clustered key
-- Example: If clustered key is 16 bytes vs 4 bytes:
-- Table with 1 million rows, 5 non-clustered indexes
-- Clustered key size × rows × number of NC indexes
-- 16 bytes: 16 × 1,000,000 × 5 = 80 MB overhead
-- 4 bytes: 4 × 1,000,000 × 5 = 20 MB overhead
Non-Clustered Indexes
Characteristics
- Separate B-tree structure from the data
- Leaf level contains index key + pointer to data (clustered key or RID)
- Can have up to 999 non-clustered indexes per table
- May require key lookup for columns not in index
Creating Non-Clustered Indexes
-- Basic non-clustered index
CREATE INDEX IX_Orders_CustomerId ON Orders(CustomerId);
-- Unique non-clustered index
CREATE UNIQUE INDEX UX_Employees_Email ON Employees(Email);
-- Composite index (column order matters!)
CREATE INDEX IX_Orders_CustomerDate
ON Orders(CustomerId, OrderDate);
-- Efficient for: WHERE CustomerId = 1
-- Efficient for: WHERE CustomerId = 1 AND OrderDate > '2024-01-01'
-- NOT efficient for: WHERE OrderDate > '2024-01-01' (alone)
-- Descending column
CREATE INDEX IX_Orders_DateDesc
ON Orders(OrderDate DESC);
Key Lookup Problem
-- Query needs columns not in index
SELECT OrderId, CustomerId, OrderDate, ShippingAddress
FROM Orders
WHERE CustomerId = 123;
-- With this index:
CREATE INDEX IX_Orders_CustomerId ON Orders(CustomerId);
-- Execution:
-- 1. Index Seek on IX_Orders_CustomerId (find matching rows)
-- 2. Key Lookup on clustered index (get ShippingAddress for each row)
-- Key lookups are expensive with many rows!
Covering Indexes
Concept
A covering index includes all columns needed by a query, eliminating the need for key lookups.
-- Query to optimize:
SELECT CustomerId, OrderDate, Total
FROM Orders
WHERE CustomerId = 123;
-- ❌ Non-covering index (requires key lookup)
CREATE INDEX IX_Orders_CustomerId ON Orders(CustomerId);
-- ✅ Covering index using INCLUDE
CREATE INDEX IX_Orders_CustomerId_Covering
ON Orders(CustomerId)
INCLUDE (OrderDate, Total);
INCLUDE vs Key Columns
-- Key columns: Used for seeking and sorting
-- Included columns: Only stored at leaf level
-- Key columns
CREATE INDEX IX_Orders_CustomerDate
ON Orders(CustomerId, OrderDate); -- Can seek on both, sort by both
-- With INCLUDE
CREATE INDEX IX_Orders_Customer_IncludeDate
ON Orders(CustomerId)
INCLUDE (OrderDate); -- Can only seek on CustomerId
-- When to use INCLUDE:
-- - Column needed in SELECT but not WHERE
-- - Column not useful for sorting
-- - Wide columns that would bloat the index
Covering Index Examples
-- Example 1: Specific query coverage
-- Query: SELECT Name, Email FROM Users WHERE Status = 'Active'
CREATE INDEX IX_Users_Status
ON Users(Status)
INCLUDE (Name, Email);
-- Example 2: Aggregate query coverage
-- Query: SELECT CustomerId, SUM(Total) FROM Orders GROUP BY CustomerId
CREATE INDEX IX_Orders_CustomerId_IncTotal
ON Orders(CustomerId)
INCLUDE (Total);
-- Example 3: JOIN coverage
-- Query: SELECT o.*, c.Name FROM Orders o JOIN Customers c ON o.CustomerId = c.Id
-- WHERE o.Status = 'Pending'
CREATE INDEX IX_Orders_Status
ON Orders(Status)
INCLUDE (CustomerId, OrderDate, Total);
CREATE INDEX IX_Customers_Id
ON Customers(Id)
INCLUDE (Name);
Filtered Indexes
Concept
A filtered index includes only rows that match a WHERE clause, reducing index size and maintenance.
-- Only index active orders
CREATE INDEX IX_Orders_Active
ON Orders(CustomerId, OrderDate)
WHERE Status = 'Active';
-- Only index non-null values
CREATE INDEX IX_Products_Discontinued
ON Products(DiscontinuedDate)
WHERE DiscontinuedDate IS NOT NULL;
-- Specific category index
CREATE INDEX IX_Products_Electronics
ON Products(Name, Price)
WHERE CategoryId = 5; -- Electronics
Benefits and Limitations
-- Benefits:
-- 1. Smaller index = less storage
-- 2. Faster index maintenance
-- 3. Better query performance for filtered subset
-- Limitations:
-- 1. Query must match filter predicate
-- 2. Filter must be deterministic
-- 3. Cannot use subqueries, functions on columns
-- ❌ Won't use filtered index:
SELECT * FROM Orders WHERE Status = 'Shipped'; -- Different status
-- ✅ Will use filtered index:
SELECT * FROM Orders WHERE Status = 'Active' AND CustomerId = 123;
Filtered Index Use Cases
-- 1. Soft deletes
CREATE INDEX IX_Customers_NotDeleted
ON Customers(Email)
WHERE IsDeleted = 0;
-- 2. Sparse columns
CREATE INDEX IX_Users_PhoneVerified
ON Users(PhoneNumber)
WHERE PhoneNumber IS NOT NULL;
-- 3. Multi-tenant with hot tenants
CREATE INDEX IX_Orders_Tenant1
ON Orders(OrderDate)
WHERE TenantId = 1; -- High-volume tenant
-- 4. Unique constraint on subset
CREATE UNIQUE INDEX IX_Employees_ActiveBadge
ON Employees(BadgeNumber)
WHERE IsActive = 1; -- Unique only among active employees
Columnstore Indexes
Characteristics
- Column-oriented storage (vs row-oriented)
- Highly compressed
- Excellent for analytics and aggregations
- Batch mode processing (thousands of rows at once)
-- Clustered columnstore (entire table stored columnwise)
CREATE CLUSTERED COLUMNSTORE INDEX CCI_SalesHistory
ON SalesHistory;
-- Non-clustered columnstore (secondary structure)
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Orders_Analytics
ON Orders(CustomerId, OrderDate, Total, ProductId);
When to Use Columnstore
-- ✅ Good for:
-- - Data warehouses
-- - Large fact tables
-- - Aggregation queries (SUM, COUNT, AVG)
-- - Full table scans
-- - Read-heavy workloads
-- Analytics query (excellent with columnstore)
SELECT
ProductCategory,
YEAR(OrderDate) AS OrderYear,
SUM(Total) AS TotalSales,
COUNT(*) AS OrderCount
FROM SalesHistory
GROUP BY ProductCategory, YEAR(OrderDate);
-- ❌ Not ideal for:
-- - Point lookups (single row)
-- - OLTP workloads
-- - Frequent updates
-- - Small tables
Columnstore Performance
-- Comparison: 100 million rows, aggregation query
-- Rowstore: 45 seconds, 500K logical reads
-- Columnstore: 2 seconds, 10K logical reads
-- Why faster:
-- 1. Column elimination (only reads needed columns)
-- 2. Compression (10x typical)
-- 3. Batch mode (processes 900+ rows at a time)
-- 4. Segment elimination (min/max metadata)
Index Design Strategies
Query-Based Design
-- 1. Identify critical queries
-- 2. Analyze predicates (WHERE, JOIN)
-- 3. Consider columns in SELECT
-- 4. Check for ORDER BY
-- Example query:
SELECT OrderId, OrderDate, Total
FROM Orders
WHERE CustomerId = @CustomerId
AND OrderDate >= @StartDate
AND Status = 'Active'
ORDER BY OrderDate DESC;
-- Index design:
CREATE INDEX IX_Orders_Query1
ON Orders(CustomerId, Status, OrderDate DESC)
INCLUDE (Total)
WHERE Status = 'Active'; -- Filtered if Status is always 'Active'
Index Column Order
-- Order matters! Left-to-right matching
-- Index: (A, B, C)
-- ✅ Uses index: WHERE A = 1
-- ✅ Uses index: WHERE A = 1 AND B = 2
-- ✅ Uses index: WHERE A = 1 AND B = 2 AND C = 3
-- ❌ Cannot seek: WHERE B = 2 (A not specified)
-- ❌ Cannot seek: WHERE C = 3 (A, B not specified)
-- Best practices:
-- 1. Equality columns first
-- 2. Range/inequality columns after
-- 3. Most selective columns first (generally)
-- Example:
-- WHERE CustomerId = 1 AND OrderDate > '2024-01-01' AND Status = 'Active'
-- Best index: (CustomerId, Status, OrderDate)
-- CustomerId: equality, Status: equality, OrderDate: range
Index Width vs Performance
-- Narrower indexes are better:
-- - Less storage
-- - More rows per page
-- - Faster scans
-- - Cheaper maintenance
-- ❌ Too wide
CREATE INDEX IX_TooWide
ON Orders(CustomerId, ProductId, OrderDate, Status, Total, ShippingId, Notes);
-- ✅ Focused
CREATE INDEX IX_Focused
ON Orders(CustomerId, OrderDate)
INCLUDE (Total);
Avoiding Over-Indexing
-- Signs of over-indexing:
-- 1. Many unused indexes
-- 2. Write performance degradation
-- 3. Large storage usage
-- 4. Duplicate/overlapping indexes
-- Example: Redundant indexes
CREATE INDEX IX1 ON Orders(CustomerId);
CREATE INDEX IX2 ON Orders(CustomerId, OrderDate); -- IX1 is redundant!
CREATE INDEX IX3 ON Orders(CustomerId) INCLUDE (Total); -- Also redundant!
-- Keep only:
CREATE INDEX IX_Orders_CustomerDate
ON Orders(CustomerId, OrderDate)
INCLUDE (Total);
Index Maintenance
Fragmentation
-- Check fragmentation
SELECT
OBJECT_NAME(i.object_id) AS TableName,
i.name AS IndexName,
s.avg_fragmentation_in_percent,
s.page_count,
s.record_count
FROM sys.dm_db_index_physical_stats(
DB_ID(), NULL, NULL, NULL, 'LIMITED') s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE s.avg_fragmentation_in_percent > 10
AND s.page_count > 1000
ORDER BY s.avg_fragmentation_in_percent DESC;
-- Fragmentation guidelines:
-- < 10%: Do nothing
-- 10-30%: Reorganize (online, less resource intensive)
-- > 30%: Rebuild (more resource intensive, better results)
Reorganize vs Rebuild
-- Reorganize: Online, less resource intensive
-- - Defragments leaf level only
-- - Always online
-- - Does not update statistics
ALTER INDEX IX_Orders_CustomerId ON Orders REORGANIZE;
-- Rebuild: More thorough
-- - Recreates entire index
-- - Updates statistics
-- - Can be online (Enterprise) or offline
ALTER INDEX IX_Orders_CustomerId ON Orders REBUILD;
-- Online rebuild (Enterprise edition)
ALTER INDEX IX_Orders_CustomerId ON Orders REBUILD WITH (ONLINE = ON);
-- Rebuild all indexes on table
ALTER INDEX ALL ON Orders REBUILD;
Automated Maintenance Script
-- Index maintenance procedure
CREATE PROCEDURE usp_MaintainIndexes
AS
BEGIN
DECLARE @TableName NVARCHAR(256)
DECLARE @IndexName NVARCHAR(256)
DECLARE @Fragmentation FLOAT
DECLARE @SQL NVARCHAR(MAX)
DECLARE index_cursor CURSOR FOR
SELECT
OBJECT_NAME(i.object_id),
i.name,
s.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE s.page_count > 1000
AND i.name IS NOT NULL
OPEN index_cursor
FETCH NEXT FROM index_cursor INTO @TableName, @IndexName, @Fragmentation
WHILE @@FETCH_STATUS = 0
BEGIN
IF @Fragmentation > 30
SET @SQL = 'ALTER INDEX ' + @IndexName + ' ON ' + @TableName + ' REBUILD'
ELSE IF @Fragmentation > 10
SET @SQL = 'ALTER INDEX ' + @IndexName + ' ON ' + @TableName + ' REORGANIZE'
ELSE
SET @SQL = NULL
IF @SQL IS NOT NULL
BEGIN
PRINT @SQL
EXEC sp_executesql @SQL
END
FETCH NEXT FROM index_cursor INTO @TableName, @IndexName, @Fragmentation
END
CLOSE index_cursor
DEALLOCATE index_cursor
END
Statistics Maintenance
-- Update statistics for table
UPDATE STATISTICS Orders;
-- Update specific index statistics
UPDATE STATISTICS Orders IX_Orders_CustomerId;
-- Full scan (more accurate)
UPDATE STATISTICS Orders WITH FULLSCAN;
-- Auto-update settings
SELECT
name,
is_auto_update_stats_on,
is_auto_update_stats_async_on
FROM sys.databases
WHERE name = DB_NAME();
Monitoring Index Usage
Unused Indexes
-- Find unused indexes (candidates for removal)
SELECT
OBJECT_NAME(i.object_id) AS TableName,
i.name AS IndexName,
i.type_desc,
s.user_seeks,
s.user_scans,
s.user_lookups,
s.user_updates,
s.last_user_seek,
s.last_user_scan
FROM sys.dm_db_index_usage_stats s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE OBJECTPROPERTY(i.object_id, 'IsMsShipped') = 0
AND s.user_seeks = 0
AND s.user_scans = 0
AND s.user_lookups = 0
AND s.user_updates > 0 -- Being maintained but never read
ORDER BY s.user_updates DESC;
Most Used Indexes
-- Find heavily used indexes
SELECT
OBJECT_NAME(i.object_id) AS TableName,
i.name AS IndexName,
s.user_seeks + s.user_scans + s.user_lookups AS TotalReads,
s.user_updates AS TotalWrites,
CAST(s.user_seeks + s.user_scans + s.user_lookups AS FLOAT) /
NULLIF(s.user_updates, 0) AS ReadWriteRatio
FROM sys.dm_db_index_usage_stats s
JOIN sys.indexes i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE OBJECTPROPERTY(i.object_id, 'IsMsShipped') = 0
ORDER BY TotalReads DESC;
Missing Indexes
-- SQL Server's missing index recommendations
SELECT TOP 20
ROUND(migs.avg_total_user_cost * migs.avg_user_impact *
(migs.user_seeks + migs.user_scans), 0) AS ImprovementScore,
mig.index_group_handle,
mid.statement AS TableName,
mid.equality_columns,
mid.inequality_columns,
mid.included_columns,
migs.user_seeks,
migs.user_scans
FROM sys.dm_db_missing_index_groups mig
JOIN sys.dm_db_missing_index_group_stats migs
ON mig.index_group_handle = migs.group_handle
JOIN sys.dm_db_missing_index_details mid
ON mig.index_handle = mid.index_handle
ORDER BY ImprovementScore DESC;
-- Generate CREATE INDEX statements
SELECT
'CREATE INDEX IX_' + OBJECT_NAME(mid.object_id) + '_' +
REPLACE(REPLACE(REPLACE(
ISNULL(mid.equality_columns, '') + ISNULL(mid.inequality_columns, ''),
'[', ''), ']', ''), ', ', '_') +
' ON ' + mid.statement +
' (' + ISNULL(mid.equality_columns, '') +
CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL
THEN ', ' ELSE '' END +
ISNULL(mid.inequality_columns, '') + ')' +
CASE WHEN mid.included_columns IS NOT NULL
THEN ' INCLUDE (' + mid.included_columns + ')' ELSE '' END
AS CreateStatement
FROM sys.dm_db_missing_index_details mid;
Index Size Analysis
-- Index size and row counts
SELECT
t.name AS TableName,
i.name AS IndexName,
i.type_desc,
p.rows,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
SUM(a.data_pages) * 8 AS DataSpaceKB
FROM sys.tables t
JOIN sys.indexes i ON t.object_id = i.object_id
JOIN sys.partitions p ON i.object_id = p.object_id AND i.index_id = p.index_id
JOIN sys.allocation_units a ON p.partition_id = a.container_id
GROUP BY t.name, i.name, i.type_desc, p.rows
ORDER BY SUM(a.total_pages) DESC;
Interview Questions
1. What is the difference between clustered and non-clustered indexes?
Answer:
- Clustered: Defines physical row order. Only one per table. Leaf nodes contain actual data rows.
- Non-clustered: Separate structure. Multiple per table. Leaf nodes contain index key + pointer (clustered key or RID).
Choosing clustered key wisely is crucial because all non-clustered indexes reference it.
2. What is a covering index?
Answer: A covering index includes all columns needed by a query in the index structure itself (via key columns or INCLUDE columns). This eliminates the need for key lookups to retrieve additional columns from the base table.
-- Query: SELECT Name, Email FROM Users WHERE Status = 'Active'
-- Covering index:
CREATE INDEX IX_Users_Status ON Users(Status) INCLUDE (Name, Email);
3. When would you use a filtered index?
Answer: Use filtered indexes when:
- Queries frequently filter on specific values
- Large portion of data is excluded (e.g., soft deletes)
- Unique constraint applies to subset of rows
- Sparse columns with many NULLs
Benefits: Smaller size, faster maintenance, better performance for matching queries.
4. How do you decide the column order in a composite index?
Answer:
- Equality columns first: Columns used with
=in WHERE - Range columns after: Columns used with
>,<,BETWEEN - Consider selectivity: More selective columns often first
- Match query patterns: Index supports left-to-right matching
Example: WHERE CustomerId = 1 AND Status = 'Active' AND OrderDate > '2024-01-01'
Best index: (CustomerId, Status, OrderDate) - two equalities first, then range.
5. What is index fragmentation and how do you address it?
Answer: Fragmentation occurs when logical order differs from physical order (page splits during inserts/updates).
Types:
- Internal: Page not full, wasted space
- External: Pages out of order
Resolution:
< 10%: Leave alone10-30%: REORGANIZE (online, leaf-level only)> 30%: REBUILD (recreates index, updates statistics)
6. What are the downsides of having too many indexes?
Answer:
- Write overhead: Every INSERT/UPDATE/DELETE maintains all indexes
- Storage cost: Indexes consume disk space
- Memory pressure: Indexes compete for buffer pool
- Optimizer confusion: More plans to evaluate
- Maintenance overhead: More indexes to rebuild/reorganize
Balance read performance gains against write overhead. Remove unused indexes.