This is my reading notes for Chapter 2 in book “System Design Interview – An insider’s guide (Vol. 2)”.
Core Objective
To design a backend that enables a “Nearby Friends” feature, which allows users to see friends within a configurable radius (e.g., 5 miles), with low latency and real-time location updates. The design is targeted at handling scalability to serve up to 10 million concurrent users and over 13 million updates per second.
Problem Scope and Requirements
Functional Requirements
- Nearby Friends Listing: Display friends within 5 miles along with the timestamp of their last known location.
- Periodic Updates: Friend locations update in real-time every few seconds.
- Scalable Friend Management: Updates should work even for users with thousands of friends (Facebook cap example: 5,000 friends).
Non-Functional Requirements
- Low Latency: Real-time updates for location data with sub-second delays.
- Reliability with Tolerance: Occasional missed updates are acceptable.
- Eventual Consistency: Exact data synchronization is not required immediately.
System Design Components and Examples
- Mobile Client: Sends periodic location updates to the backend using a WebSocket connection, typically every 30 seconds.
- Load Balancer: Sits in front of WebSocket and API servers, distributing incoming location updates to manage load efficiently.
- WebSocket Servers: Maintains real-time connections with users, handling bi-directional communication for sending and receiving location updates.
- Example: When User A sends a location update, the WebSocket server forwards it to friends of User A, subscribing only active friends to prevent excessive load.
- Redis Location Cache: Stores the most recent location of each active user with a Time-to-Live (TTL) expiry.
- TTL Functionality: When User A goes offline, their location expires in Redis after TTL, keeping the cache manageable.
- Example Key-Value:
{ user_id: { latitude: 37.7749, longitude: -122.4194, timestamp: 1600000000 }}
with TTL ensuring User A’s location is purged if inactive.
- Redis Pub/Sub: Manages real-time distribution of location updates to subscribed users. Each active user has a dedicated channel that friends can subscribe to for receiving updates.
- Example Workflow:
- User A’s Location Update: Published to
User_A_channel
in Redis. - Subscribers: All active friends of User A (e.g., User B and User C) receive location updates if within the 5-mile radius.
- User A’s Location Update: Published to
- Example Workflow:
- Location History Database: Archives location data for analytics and future recommendation features, like commonly visited places or route optimizations.
Location Update Flow: Example Walkthrough
- User A moves and sends a location update.
- WebSocket Server updates Redis Location Cache, refreshing the TTL and publishing the new location to
User_A_channel
. - Redis Pub/Sub broadcasts the location update to active friends who are subscribed.
- WebSocket Servers for each friend (e.g., User B) calculate the distance between User B and User A. If within 5 miles, User B receives the update; otherwise, it’s dropped.
Detailed Component Insights
Redis Location Cache
- Purpose: Keeps only the latest location for each user, minimizing storage needs.
- TTL Management: Ensures inactive users’ locations are removed automatically. A typical Redis instance with sufficient memory can handle millions of active user locations efficiently.
Redis Pub/Sub Design
- Channel Mechanism: Each active user’s channel acts as a “topic” where updates are broadcast. This design is both lightweight and scalable, allowing each friend to subscribe only to relevant updates.
- Reliability: Redis Pub/Sub, with millions of channels, balances load well across servers by sharding user channels. The architecture allows incremental scaling by adding Redis instances as user count grows.
WebSocket Servers
- Persistent Connections: Every user maintains a single connection for sending and receiving updates.
- Distance Calculation on Update: Each WebSocket connection recalculates distances when receiving a location update. This ensures only relevant location changes are pushed to users, reducing bandwidth.
- Initialization: When a user first connects, the WebSocket server retrieves all friend locations within the radius and sends them to the client, establishing an initial state.
Optimization Techniques and Advanced Features
Users with High Friend Counts
- Scaling Considerations: For users with many friends, Redis Pub/Sub’s distributed design handles load effectively by spreading friends’ subscriptions across different WebSocket and Pub/Sub servers.
- Avoiding Performance Bottlenecks: Friend subscriptions are scattered, reducing the chance of single-server overload.
Adding/Removing Friends
- Real-Time Subscription Management: When a user adds a friend, a callback registers with the WebSocket server to subscribe to the friend’s Redis Pub/Sub channel.
- Example: User B adds User D as a friend. A callback triggers User B’s WebSocket server to subscribe to
User_D_channel
, receiving User D’s location data in real-time.
- Example: User B adds User D as a friend. A callback triggers User B’s WebSocket server to subscribe to
Random Nearby Users (Advanced Feature)
- Geohash-Based Subscriptions: For showing random nearby users, Redis Pub/Sub channels are divided by geohash, allowing users within a specific grid to receive updates from others in the same region.
- Border Handling: Users close to geohash boundaries subscribe to neighboring grids to ensure nearby users in adjacent grids are included in updatess Pub/Sub: Erlang
- Erlang’s Lightweight Processes: Suggested as an alternative for its low-memory footprint, with processes that are ideal for handling millions of concurrent users due to low resource consumption.
- Distributed Subscriptions: Each friend’s updates are handled through direct Erlang process connections, reducing reliance on Redis while leveraging Erlang’s fault-tolerance.
Key API Design Considerations
- WebSocket APIs:
- Periodic Location Update:
- Request:
{ latitude, longitude, timestamp }
- Request:
- Receive Friend Updates: Friends’ current locations within the radius are pushed as updates arrive.
- Friend Subscribe/Unsubscribe: When a friend is added/removed, the WebSocket server updates subscriptions accordingly.
- Periodic Location Update:
- REST APIs (Supporting Functions):
- Friend Management: Allows adding/removing friends, indirectly affecting nearby friend updates.
- User Profile Management: Enables user details updates which may influence location sharing settings.
Key Points Summary
- Reliability and Latency: The system prioritizes low latency for real-time location updates, accepting minor inconsistencies to maintain responsiveness.
- Scalability Challenges: The design handles millions of concurrent users by efficient use of Redis for fast caching and message-passing, minimizing bottlenecks.
- Data Privacy Considerations: Though not deeply covered here, data privacy (e.g., GDPR) should be ensured in production with strict access controls.
This expanded design incorporates efficient caching, real-time messaging, and load distribution for a highly scalable Nearby Friends feature capable of serving millions of users. The architecture leverages Redis and WebSocket for real-time updates and includes advanced optimizations for handling high-user scenarios.