System Design Fundamentals
A comprehensive guide to core system design concepts essential for building scalable, reliable systems and for technical interviews.
Scalability
What Is Scalability?
Scalability describes a systemβs elasticity - its ability to adapt to change and demand. Good scalability protects against downtime and ensures service quality.
Horizontal vs Vertical Scaling
| Aspect | Horizontal Scaling | Vertical Scaling |
|---|---|---|
| Definition | Adding more machines/nodes | Adding resources to existing machine |
| Also Known As | Scaling Out | Scaling Up |
| Example | Add 3 more servers | Upgrade CPU, RAM, SSD |
| Cost | Linear (more commodity hardware) | Exponential (high-end hardware) |
| Complexity | Higher (distributed systems) | Lower (single system) |
| Limit | Virtually unlimited | Hardware limits |
| Downtime | Zero (add nodes online) | Possible (during upgrades) |
Horizontal Scaling:
[User] β [Load Balancer] β [Server 1]
β [Server 2]
β [Server 3]
Vertical Scaling:
[User] β [Beefier Server (more CPU, RAM, etc.)]
Caching
What is Caching?
Caching acts as a local store for data - retrieving from this temporary storage is faster than retrieving from the database. Think of it as short-term memory: limited space but fast, containing recently/frequently accessed items.
How Cache Works
First Request (Cache Miss):
[Client] β [App Server] β [Cache] β β [Database]
β
Store Result
Second Request (Cache Hit):
[Client] β [App Server] β [Cache] β β Return immediately
Cache Levels
βββββββββββββββββββββββββββββββββββββββ
β L1 CPU Cache (Fastest) β
βββββββββββββββββββββββββββββββββββββββ€
β L2 CPU Cache β
βββββββββββββββββββββββββββββββββββββββ€
β L3 CPU Cache β
βββββββββββββββββββββββββββββββββββββββ€
β RAM (Primary Memory) β
βββββββββββββββββββββββββββββββββββββββ€
β Application Cache (Redis) β
βββββββββββββββββββββββββββββββββββββββ€
β Browser Cache β
βββββββββββββββββββββββββββββββββββββββ€
β CDN Cache β
βββββββββββββββββββββββββββββββββββββββ€
β Disk (Secondary Memory) β
βββββββββββββββββββββββββββββββββββββββ
Types of Cache
1. Application Server Cache
In-memory cache alongside the application server.
// Simple in-memory cache with IMemoryCache
public class ProductService
{
private readonly IMemoryCache _cache;
private readonly IProductRepository _repo;
public async Task<Product> GetProductAsync(int id)
{
return await _cache.GetOrCreateAsync($"product:{id}", async entry =>
{
entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10);
return await _repo.GetByIdAsync(id);
});
}
}
Drawback: Doesnβt work well with multiple servers (load balancer causes cache misses).
2. Distributed Cache
Cache is distributed across multiple nodes using consistent hashing.
// Distributed cache with Redis
public class ProductService
{
private readonly IDistributedCache _cache;
public async Task<Product> GetProductAsync(int id)
{
var cached = await _cache.GetStringAsync($"product:{id}");
if (cached != null)
return JsonSerializer.Deserialize<Product>(cached);
var product = await _repo.GetByIdAsync(id);
await _cache.SetStringAsync($"product:{id}",
JsonSerializer.Serialize(product),
new DistributedCacheEntryOptions
{
AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
});
return product;
}
}
3. Global Cache
Single shared cache space for all nodes.
4. CDN (Content Delivery Network)
Geographically distributed servers caching static content (HTML, CSS, JS, images, videos).
User in Europe β CDN Edge Server (Europe) β Origin Server (if cache miss)
User in Asia β CDN Edge Server (Asia) β Origin Server (if cache miss)
Cache Eviction Policies
| Policy | Description | Use Case |
|---|---|---|
| LRU | Least Recently Used - evicts oldest accessed | General purpose |
| LFU | Least Frequently Used - evicts least accessed | Varying access patterns |
| FIFO | First In First Out - evicts oldest added | Simple scenarios |
| TTL | Time To Live - expires after set time | Time-sensitive data |
Cache Invalidation Strategies
// Write-Through: Update cache and DB together
public async Task UpdateProductAsync(Product product)
{
await _repo.UpdateAsync(product);
await _cache.SetStringAsync($"product:{product.Id}",
JsonSerializer.Serialize(product));
}
// Write-Behind: Update cache, async DB update
public async Task UpdateProductAsync(Product product)
{
await _cache.SetStringAsync($"product:{product.Id}",
JsonSerializer.Serialize(product));
_backgroundQueue.Enqueue(() => _repo.UpdateAsync(product));
}
// Cache-Aside: Application manages cache
public async Task<Product> GetProductAsync(int id)
{
var cached = await _cache.GetStringAsync($"product:{id}");
if (cached != null)
return JsonSerializer.Deserialize<Product>(cached);
var product = await _repo.GetByIdAsync(id);
if (product != null)
await _cache.SetStringAsync($"product:{id}",
JsonSerializer.Serialize(product));
return product;
}
Load Balancing
What is a Load Balancer?
A load balancer distributes incoming traffic among servers to provide:
- High availability - if one server fails, others handle traffic
- Efficient utilization - no single server is overloaded
- High performance - optimized response times
Without Load Balancer (Problems)
[Users] β [Single Server] β Single Point of Failure!
β Gets Overloaded!
With Load Balancer
[Users] β [Load Balancer] β [Server 1] β
β [Server 2] β
β [Server 3] β
Health checks ensure only healthy servers receive traffic
Load Balancer Placement
[Client] β [LB] β [Web Servers]
[LB] β [App Servers]
[LB] β [Cache Servers]
[LB] β [Database Servers]
Types of Load Balancers
By Layer
| Type | OSI Layer | Routing Based On |
|---|---|---|
| L4 | Transport | IP, Port, Protocol |
| L7 | Application | URL, Headers, Cookies, Content |
| GSLB | Geographic | Location, Server Health, Proximity |
By Implementation
- Hardware: Physical appliances (F5, Citrix) - expensive but powerful
- Software: Applications (NGINX, HAProxy) - flexible and cost-effective
- Virtual: VMs in cloud environments
Load Balancing Algorithms
// 1. Round Robin - Sequential distribution
public class RoundRobinBalancer
{
private int _current = -1;
private readonly List<string> _servers;
public string GetNextServer()
{
_current = (_current + 1) % _servers.Count;
return _servers[_current];
}
}
// 2. Weighted Round Robin - Based on server capacity
public class WeightedRoundRobinBalancer
{
private readonly List<(string Server, int Weight)> _servers;
// Server with weight 3 gets 3x more requests than weight 1
}
// 3. Least Connections - To server with fewest active connections
public class LeastConnectionsBalancer
{
private readonly Dictionary<string, int> _connections;
public string GetNextServer()
{
return _connections.OrderBy(c => c.Value).First().Key;
}
}
// 4. IP Hash - Same client always goes to same server
public class IpHashBalancer
{
public string GetServer(string clientIp)
{
int hash = clientIp.GetHashCode();
return _servers[Math.Abs(hash) % _servers.Count];
}
}
// 5. Least Response Time - To fastest responding server
// Combines response time + active connections
Database Replication
What is Database Replication?
Copying data from a primary database to replica databases to improve:
- Availability - system continues if primary fails
- Performance - read queries distributed across replicas
- Reliability - data redundancy
Replication Topologies
1. Master-Slave (Primary-Replica)
[Primary] β [Replica 1]
β [Replica 2]
β [Replica 3]
Writes: Primary only
Reads: Any node
2. Master-Master (Multi-Primary)
[Primary 1] β [Primary 2]
Writes: Any primary
Reads: Any node
Conflict resolution needed
3. Chain Replication
[Primary] β [Replica 1] β [Replica 2]
Sequential propagation
Benefits of Replication
| Benefit | Description |
|---|---|
| High Availability | System continues if one database fails |
| Load Distribution | Read queries spread across replicas |
| Geographic Distribution | Data closer to users |
| Analytics Separation | Run heavy queries on replica |
| Disaster Recovery | Built-in backup |
System Design Interview Tips
Key Concepts to Demonstrate
- Scalability: How to handle 10x, 100x traffic
- Availability: What happens when components fail
- Performance: Caching, CDN, load balancing strategies
- Data Management: Replication, sharding, consistency
- Trade-offs: CAP theorem, consistency vs availability
Interview Approach
1. Clarify Requirements (5 min)
- Functional: What should the system do?
- Non-functional: Scale, latency, availability targets
2. High-Level Design (10-15 min)
- Components: API, services, database, cache
- Data flow: How requests move through system
3. Deep Dive (15-20 min)
- Database schema
- API design
- Caching strategy
- Load balancing
4. Address Bottlenecks (5 min)
- Single points of failure
- Scaling limitations
- Trade-offs made
Common Questions
- Design a URL shortener
- Design a rate limiter
- Design Twitter/Instagram feed
- Design a notification system
- Design a distributed cache
Sources
Interviuri/Interviu Microsoft/System Design.docx- Reference: System Design Primer