SQL Joins Comprehensive Guide
A comprehensive guide to SQL join types, including physical join algorithms, logical join types, and set operators.
Physical Join Algorithms
SQL Server uses different physical join algorithms to combine data from tables. Understanding these helps with query optimization.
Nested Loop Join
Description: Iterates through each row of the outer table and finds matching rows in the inner table.
Best For:
- Small tables
- When indexes are available on join columns
- When one table is significantly smaller than the other
Performance: O(n Γ m) worst case, O(n) with index
-- Query that typically uses Nested Loop
SELECT o.OrderID, c.CustomerName
FROM Orders o
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE o.OrderDate > '2024-01-01'
-- Small result set from Orders, index on CustomerID
Hash Join
Description: Creates a hash table in memory from the smaller (build) table, then probes it with rows from the larger (probe) table.
Best For:
- Large tables without indexes
- Large result sets
- When sufficient memory is available
Performance: O(n + m) with sufficient memory
-- Query that typically uses Hash Join
SELECT a.*, b.*
FROM LargeTable1 a
INNER JOIN LargeTable2 b ON a.SomeColumn = b.SomeColumn
-- No indexes, large tables
Merge Join
Description: Requires both inputs to be sorted on the join key. Walks through both sorted inputs simultaneously.
Best For:
- Pre-sorted data (indexes on join columns)
- Large result sets
- Equality joins
Performance: O(n + m) with sorted inputs
-- Query that typically uses Merge Join
SELECT a.*, b.*
FROM TableA a
INNER JOIN TableB b ON a.ID = b.ID
-- Both tables have clustered indexes on ID
Logical Join Types
INNER JOIN
Returns only rows with matches in both tables.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
INNER JOIN Departments d ON e.DepartmentID = d.DepartmentID;
-- Result: Only employees who have a department
LEFT JOIN (LEFT OUTER JOIN)
Returns all rows from the left table, with NULLs for non-matching right table rows.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
LEFT JOIN Departments d ON e.DepartmentID = d.DepartmentID;
-- Result: All employees, NULL for DepartmentName if no department
RIGHT JOIN (RIGHT OUTER JOIN)
Returns all rows from the right table, with NULLs for non-matching left table rows.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
RIGHT JOIN Departments d ON e.DepartmentID = d.DepartmentID;
-- Result: All departments, NULL for EmployeeName if no employees
FULL JOIN (FULL OUTER JOIN)
Returns all rows from both tables, with NULLs where no match exists.
SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
FULL JOIN Departments d ON e.DepartmentID = d.DepartmentID;
-- Result: All employees + all departments, NULLs for non-matches
CROSS JOIN
Returns the Cartesian product (all combinations) of both tables. No join condition.
SELECT p.ProductName, c.ColorName
FROM Products p
CROSS JOIN Colors c;
-- If Products has 10 rows and Colors has 5,
-- result will have 50 rows (10 Γ 5)
Use Case: Generate all possible combinations (e.g., product configurations).
SELF JOIN
Joins a table with itself. Useful for hierarchical data.
-- Find employees and their managers
SELECT e.EmployeeName AS Employee, m.EmployeeName AS Manager
FROM Employees e
LEFT JOIN Employees m ON e.ManagerID = m.EmployeeID;
NATURAL JOIN
Joins on columns with the same name in both tables. Generally avoided in production.
-- Implicitly joins on columns with same name
SELECT *
FROM Employees
NATURAL JOIN Departments;
-- Joins on any columns named the same (e.g., DepartmentID)
Warning: Can be unpredictable; prefer explicit JOIN conditions.
APPLY Operators
CROSS APPLY
Applies a table-valued function or subquery to each row of the outer table. Only returns rows where the function returns results.
-- Get top 3 orders for each customer
SELECT c.CustomerName, o.OrderID, o.OrderDate
FROM Customers c
CROSS APPLY (
SELECT TOP 3 OrderID, OrderDate
FROM Orders
WHERE CustomerID = c.CustomerID
ORDER BY OrderDate DESC
) o;
-- With table-valued function
SELECT e.EmployeeID, p.ProductName
FROM Employees e
CROSS APPLY dbo.GetTopProductsByEmployee(e.EmployeeID) p;
OUTER APPLY
Like CROSS APPLY, but includes all rows from the outer table even if the function returns no results (NULLs for missing values).
-- Get most recent order for each customer (including customers with no orders)
SELECT c.CustomerID, o.OrderID
FROM Customers c
OUTER APPLY (
SELECT TOP 1 OrderID
FROM Orders o
WHERE o.CustomerID = c.CustomerID
ORDER BY OrderDate DESC
) o;
Set Operators
UNION
Combines result sets vertically, removing duplicates.
SELECT FirstName, LastName FROM Employees
UNION
SELECT FirstName, LastName FROM Contractors;
-- Distinct names from both tables
UNION ALL
Combines result sets vertically, keeping all duplicates. Faster than UNION.
SELECT FirstName, LastName FROM Employees
UNION ALL
SELECT FirstName, LastName FROM Contractors;
-- All names, including duplicates
INTERSECT
Returns only rows that appear in both result sets.
SELECT ProductID FROM OrdersUS
INTERSECT
SELECT ProductID FROM OrdersEU;
-- Products ordered in both US and EU
EXCEPT
Returns rows from the first result set that donβt appear in the second.
SELECT ProductID FROM AllProducts
EXCEPT
SELECT ProductID FROM DiscontinuedProducts;
-- Active products only
Join Performance Tips
1. Index Join Columns
-- Create indexes on frequently joined columns
CREATE INDEX IX_Orders_CustomerID ON Orders(CustomerID);
CREATE INDEX IX_OrderDetails_OrderID ON OrderDetails(OrderID);
2. Filter Before Joining
-- β Filter after join
SELECT *
FROM Orders o
JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE o.OrderDate > '2024-01-01';
-- β
Filter first with derived table or CTE
WITH RecentOrders AS (
SELECT * FROM Orders WHERE OrderDate > '2024-01-01'
)
SELECT *
FROM RecentOrders o
JOIN Customers c ON o.CustomerID = c.CustomerID;
3. Use Appropriate Join Types
| Scenario | Recommended Join |
|---|---|
| Small table + large indexed table | Nested Loop |
| Large tables, no indexes | Hash Join |
| Both tables have sorted indexes | Merge Join |
| Row-by-row function application | CROSS/OUTER APPLY |
4. Avoid SELECT *
-- β Retrieves unnecessary columns
SELECT * FROM Orders o JOIN Customers c ON o.CustomerID = c.CustomerID;
-- β
Only needed columns
SELECT o.OrderID, o.OrderDate, c.CustomerName
FROM Orders o
JOIN Customers c ON o.CustomerID = c.CustomerID;
Join Comparison Summary
| Join Type | Returns | NULL Handling |
|---|---|---|
| INNER JOIN | Matching rows only | No NULLs |
| LEFT JOIN | All left + matching right | NULLs for no right match |
| RIGHT JOIN | All right + matching left | NULLs for no left match |
| FULL JOIN | All rows from both | NULLs for no match |
| CROSS JOIN | All combinations | N/A |
| CROSS APPLY | Only matching function results | No NULLs |
| OUTER APPLY | All outer + function results | NULLs for no function result |
Sources
SQL/Teorie/Join Types.docx