πŸ“„

SQL Joins Comprehensive Guide

Beginner 3 min read 600 words

SQL Joins Comprehensive Guide

A comprehensive guide to SQL join types, including physical join algorithms, logical join types, and set operators.

Physical Join Algorithms

SQL Server uses different physical join algorithms to combine data from tables. Understanding these helps with query optimization.

Nested Loop Join

Description: Iterates through each row of the outer table and finds matching rows in the inner table.

Best For:

  • Small tables
  • When indexes are available on join columns
  • When one table is significantly smaller than the other

Performance: O(n Γ— m) worst case, O(n) with index

-- Query that typically uses Nested Loop
SELECT o.OrderID, c.CustomerName
FROM Orders o
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE o.OrderDate > '2024-01-01'
-- Small result set from Orders, index on CustomerID

Hash Join

Description: Creates a hash table in memory from the smaller (build) table, then probes it with rows from the larger (probe) table.

Best For:

  • Large tables without indexes
  • Large result sets
  • When sufficient memory is available

Performance: O(n + m) with sufficient memory

-- Query that typically uses Hash Join
SELECT a.*, b.*
FROM LargeTable1 a
INNER JOIN LargeTable2 b ON a.SomeColumn = b.SomeColumn
-- No indexes, large tables

Merge Join

Description: Requires both inputs to be sorted on the join key. Walks through both sorted inputs simultaneously.

Best For:

  • Pre-sorted data (indexes on join columns)
  • Large result sets
  • Equality joins

Performance: O(n + m) with sorted inputs

-- Query that typically uses Merge Join
SELECT a.*, b.*
FROM TableA a
INNER JOIN TableB b ON a.ID = b.ID
-- Both tables have clustered indexes on ID

Logical Join Types

INNER JOIN

Returns only rows with matches in both tables.

SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
INNER JOIN Departments d ON e.DepartmentID = d.DepartmentID;

-- Result: Only employees who have a department

LEFT JOIN (LEFT OUTER JOIN)

Returns all rows from the left table, with NULLs for non-matching right table rows.

SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
LEFT JOIN Departments d ON e.DepartmentID = d.DepartmentID;

-- Result: All employees, NULL for DepartmentName if no department

RIGHT JOIN (RIGHT OUTER JOIN)

Returns all rows from the right table, with NULLs for non-matching left table rows.

SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
RIGHT JOIN Departments d ON e.DepartmentID = d.DepartmentID;

-- Result: All departments, NULL for EmployeeName if no employees

FULL JOIN (FULL OUTER JOIN)

Returns all rows from both tables, with NULLs where no match exists.

SELECT e.EmployeeName, d.DepartmentName
FROM Employees e
FULL JOIN Departments d ON e.DepartmentID = d.DepartmentID;

-- Result: All employees + all departments, NULLs for non-matches

CROSS JOIN

Returns the Cartesian product (all combinations) of both tables. No join condition.

SELECT p.ProductName, c.ColorName
FROM Products p
CROSS JOIN Colors c;

-- If Products has 10 rows and Colors has 5,
-- result will have 50 rows (10 Γ— 5)

Use Case: Generate all possible combinations (e.g., product configurations).

SELF JOIN

Joins a table with itself. Useful for hierarchical data.

-- Find employees and their managers
SELECT e.EmployeeName AS Employee, m.EmployeeName AS Manager
FROM Employees e
LEFT JOIN Employees m ON e.ManagerID = m.EmployeeID;

NATURAL JOIN

Joins on columns with the same name in both tables. Generally avoided in production.

-- Implicitly joins on columns with same name
SELECT *
FROM Employees
NATURAL JOIN Departments;
-- Joins on any columns named the same (e.g., DepartmentID)

Warning: Can be unpredictable; prefer explicit JOIN conditions.

APPLY Operators

CROSS APPLY

Applies a table-valued function or subquery to each row of the outer table. Only returns rows where the function returns results.

-- Get top 3 orders for each customer
SELECT c.CustomerName, o.OrderID, o.OrderDate
FROM Customers c
CROSS APPLY (
    SELECT TOP 3 OrderID, OrderDate
    FROM Orders
    WHERE CustomerID = c.CustomerID
    ORDER BY OrderDate DESC
) o;
-- With table-valued function
SELECT e.EmployeeID, p.ProductName
FROM Employees e
CROSS APPLY dbo.GetTopProductsByEmployee(e.EmployeeID) p;

OUTER APPLY

Like CROSS APPLY, but includes all rows from the outer table even if the function returns no results (NULLs for missing values).

-- Get most recent order for each customer (including customers with no orders)
SELECT c.CustomerID, o.OrderID
FROM Customers c
OUTER APPLY (
    SELECT TOP 1 OrderID
    FROM Orders o
    WHERE o.CustomerID = c.CustomerID
    ORDER BY OrderDate DESC
) o;

Set Operators

UNION

Combines result sets vertically, removing duplicates.

SELECT FirstName, LastName FROM Employees
UNION
SELECT FirstName, LastName FROM Contractors;
-- Distinct names from both tables

UNION ALL

Combines result sets vertically, keeping all duplicates. Faster than UNION.

SELECT FirstName, LastName FROM Employees
UNION ALL
SELECT FirstName, LastName FROM Contractors;
-- All names, including duplicates

INTERSECT

Returns only rows that appear in both result sets.

SELECT ProductID FROM OrdersUS
INTERSECT
SELECT ProductID FROM OrdersEU;
-- Products ordered in both US and EU

EXCEPT

Returns rows from the first result set that don’t appear in the second.

SELECT ProductID FROM AllProducts
EXCEPT
SELECT ProductID FROM DiscontinuedProducts;
-- Active products only

Join Performance Tips

1. Index Join Columns

-- Create indexes on frequently joined columns
CREATE INDEX IX_Orders_CustomerID ON Orders(CustomerID);
CREATE INDEX IX_OrderDetails_OrderID ON OrderDetails(OrderID);

2. Filter Before Joining

-- ❌ Filter after join
SELECT *
FROM Orders o
JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE o.OrderDate > '2024-01-01';

-- βœ… Filter first with derived table or CTE
WITH RecentOrders AS (
    SELECT * FROM Orders WHERE OrderDate > '2024-01-01'
)
SELECT *
FROM RecentOrders o
JOIN Customers c ON o.CustomerID = c.CustomerID;

3. Use Appropriate Join Types

Scenario Recommended Join
Small table + large indexed table Nested Loop
Large tables, no indexes Hash Join
Both tables have sorted indexes Merge Join
Row-by-row function application CROSS/OUTER APPLY

4. Avoid SELECT *

-- ❌ Retrieves unnecessary columns
SELECT * FROM Orders o JOIN Customers c ON o.CustomerID = c.CustomerID;

-- βœ… Only needed columns
SELECT o.OrderID, o.OrderDate, c.CustomerName
FROM Orders o
JOIN Customers c ON o.CustomerID = c.CustomerID;

Join Comparison Summary

Join Type Returns NULL Handling
INNER JOIN Matching rows only No NULLs
LEFT JOIN All left + matching right NULLs for no right match
RIGHT JOIN All right + matching left NULLs for no left match
FULL JOIN All rows from both NULLs for no match
CROSS JOIN All combinations N/A
CROSS APPLY Only matching function results No NULLs
OUTER APPLY All outer + function results NULLs for no function result

Sources

  • SQL/Teorie/Join Types.docx