WhatsApp Architecture Case Study
How WhatsApp handles 100+ billion messages daily with remarkable efficiency.
Architecture Overview
WhatsApp is known for its incredibly efficient architecture, handling massive scale with a relatively small engineering team.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Mobile Clients β
β (iOS, Android, Web, Desktop) β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Load Balancers β
β (Geographic Distribution) β
βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββββββββββ
β Connection β β Message β β Media Storage β
β Servers β β Routing β β (S3/CDN) β
β (XMPP/Noise) β β Servers β β β
βββββββββββββββββ βββββββββββββββββ βββββββββββββββββββββββββ
Core Technology Stack
Erlang/OTP
WhatsAppβs backend is primarily built on Erlang, chosen for:
- Concurrency: Lightweight processes (millions per server)
- Fault Tolerance: βLet it crashβ philosophy
- Hot Code Swapping: Update without downtime
- Distributed Computing: Built-in distribution
%% Example: Erlang process handling
-module(message_handler).
-export([start/0, handle/1]).
start() ->
spawn(fun() -> loop() end).
loop() ->
receive
{send, Message, To} ->
route_message(Message, To),
loop();
stop ->
ok
end.
FreeBSD Operating System
- Highly tuned for networking
- Better performance than Linux for their workload
- Custom kernel optimizations
Key Components
1. Connection Management
- Protocol: Custom protocol based on XMPP (simplified)
- Encryption: Signal Protocol (end-to-end)
- Connections: Long-lived TCP connections
- Compression: Efficient binary protocol
2. Message Flow
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β Sender βββββΆβ Server βββββΆβ Server βββββΆβ Receiver β
β Client β β (Home) β β (Dest) β β Client β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β
βΌ
ββββββββββββββββ
β Mnesia/MySQL β
β (Offline) β
ββββββββββββββββ
Message States:
- Single checkmark: Delivered to server
- Double checkmark: Delivered to recipient
- Blue checkmarks: Read by recipient
3. Data Storage
| Component | Storage | Purpose |
|---|---|---|
| Messages (offline) | Mnesia β MySQL | Store until delivered |
| User profiles | MySQL | Account data |
| Media files | Amazon S3 | Images, videos, documents |
| Keys | Local device | End-to-end encryption keys |
4. Media Handling
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β Client βββββΆβ Upload βββββΆβ S3 β
β Uploads β β Server β β Storage β
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β
βΌ
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β Client ββββββ CDN ββββββ Generate β
β Downloads β β β β URL β
ββββββββββββββ ββββββββββββββ ββββββββββββββ
Scalability Strategies
1. Server Efficiency
- 2 million connections per server (Erlangβs strength)
- Custom memory management
- Optimized garbage collection
2. Database Optimization
- Read replicas for scaling reads
- Sharding by user ID
- Minimal data storage (messages deleted after delivery)
3. Caching
βββββββββββββββ βββββββββββββββ
β Request ββββββΆβ Memcached β (Hit: Return)
βββββββββββββββ ββββββββ¬βββββββ
β (Miss)
βΌ
βββββββββββββββ
β MySQL β
βββββββββββββββ
End-to-End Encryption
Signal Protocol Implementation
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Key Exchange β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 1. Identity Key (long-term) β
β 2. Signed Pre-Key (medium-term) β
β 3. One-Time Pre-Keys (single use) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Double Ratchet Algorithm β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β - Forward secrecy β
β - Break-in recovery β
β - Per-message keys β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Group Messaging
- Sender Keys for efficiency
- Each member has unique key
- Server cannot decrypt messages
Performance Metrics
| Metric | Value |
|---|---|
| Daily Messages | 100+ billion |
| Monthly Active Users | 2+ billion |
| Engineers (2014) | ~50 |
| Servers (2014) | ~550 |
| Messages/second | 1+ million |
Design Principles
1. Simplicity
- Focus on core messaging functionality
- Minimal features, maximum reliability
- Simple user experience
2. Efficiency
- Binary protocol (not JSON/XML)
- Minimal server storage
- Optimized network usage
3. Privacy
- End-to-end encryption by default
- Minimal data collection
- Messages not stored on servers
4. Reliability
- Messages always delivered
- Offline message queuing
- Automatic reconnection
Lessons for Architects
1. Choose the Right Technology
Erlang was perfect for WhatsAppβs needs:
- Concurrent connections
- Fault tolerance
- Low latency
2. Optimize Ruthlessly
- Every byte counts
- Profile and measure
- Custom solutions when needed
3. Keep It Simple
- Fewer features, done well
- Minimal dependencies
- Clear architecture
4. Plan for Scale
- Design for millions from day one
- Horizontal scaling capability
- Efficient resource usage
C# Equivalent Patterns
Connection Handling (SignalR)
public class ChatHub : Hub
{
public async Task SendMessage(string user, string message)
{
await Clients.User(user).SendAsync("ReceiveMessage", message);
}
public override async Task OnConnectedAsync()
{
await Groups.AddToGroupAsync(Context.ConnectionId, "Online");
await base.OnConnectedAsync();
}
}
Message Queue Pattern
public class MessageService
{
private readonly IMessageQueue _queue;
public async Task SendMessageAsync(Message message)
{
if (await IsUserOnline(message.RecipientId))
{
await DeliverDirectly(message);
}
else
{
await _queue.EnqueueForDelivery(message);
}
}
}
Sources
Arhitectura/WhatsApp architecture.gif