Proximity Service

This is my reading notes for Chapter 1 in book “System Design Interview – An insider’s guide (Vol. 2)”.

Overview of Proximity Service

Proximity Service

A Proximity Service finds nearby locations (restaurants, hotels, theaters) based on a user’s geographical data, such as Yelp’s search for nearby restaurants or Google Maps locating gas stations. This is central to many location-based services (LBS) that rely on geospatial indexing and real-time location data to provide personalized search results.

Key Technologies Involved:

  • GPS (Global Positioning System): To determine user location (latitude and longitude).
  • Geospatial Databases: Databases that store and query location data, optimized for searches involving geographical data.
  • Geofencing: A method that creates a virtual perimeter around a real-world geographic area to trigger certain actions when entering or exiting the defined boundaries.

Key Concepts of a Proximity Service

  1. Search Based on Location:
    • The core functionality is to find businesses based on latitude, longitude, and an optional search radius.
    • Relevant Knowledge:
      • Distance Calculation: Haversine formula is used to calculate the shortest distance between two points on the Earth’s surface.
      • Proximity Search: In many systems, users can refine searches using filters like time (e.g., open now), price range, and reviews.
      • Real-Time Search Optimization: Caching frequent queries or applying rate limits for high-traffic times.
    • Example: A user opens Google Maps to find the nearest gas station within 5 km.
  2. Business Operations (CRUD):
    • The system allows business owners to create, update, delete, and retrieve business listings. Updates do not need to be immediate.
    • Relevant Knowledge:
      • Database Write Optimizations: Often, a delay in reflecting changes is used to reduce write loads, especially in eventual consistency models.
      • CAP Theorem: Consistency vs. availability trade-offs can occur when prioritizing real-time data updates (consistency) versus ensuring uptime (availability).
    • Example: A restaurant owner updates their opening hours or special deals, but the changes become visible the next day.
  3. Viewing Business Details:
    • Once the search results are returned, users can view more detailed information such as reviews, ratings, images, and operating hours.
    • Relevant Knowledge:
      • Microservices Architecture: Business information retrieval can be separated into smaller services, such as handling images, reviews, or hours independently, optimizing for scalability.
      • Pagination in APIs: Pagination ensures that a system doesn’t overload the user or the backend by fetching all results at once. This is critical in scenarios with thousands of results.
    • Example: A user clicks on a restaurant listing to view its reviews, photos, and average wait times on Yelp.

Step 1: Understanding the Problem

Functional Requirements:

  • Search based on location: The system should return nearby businesses based on the user’s latitude and longitude and a radius specified by the user.
    • Example: A user searching for coffee shops within a 1 km radius.
  • Business Updates: Business owners can update, delete, or add businesses, but these updates do not have to be reflected immediately.
    • Example: An owner updates their business address, but it becomes visible only the next day.
  • Detailed Information Display: The system should provide the capability for users to see detailed information (reviews, star ratings, images).
    • Example: Showing a coffee shop’s rating, menu, and top reviews.

Non-Functional Requirements:

  • Low Latency: Quick response times, especially when a user is on a mobile device.
    • Relevant Knowledge:
      • Edge Computing: Reducing latency by serving user requests from data centers that are physically closer to the user.
      • CDNs (Content Delivery Networks): Often used to cache static content like business photos or location data, significantly reducing load time.
    • Example: Ensuring results are displayed instantly when a user performs a search on Google Maps.
  • Data Privacy: The system should comply with privacy laws like GDPR and CCPA as it handles sensitive user location data.
    • Relevant Knowledge:
      • Anonymization: Techniques like truncating the last few digits of latitude/longitude data to anonymize user data.
      • User Consent: Obtaining clear consent before collecting location data to comply with regulations.
    • Example: The system notifies users about data collection and anonymizes their location after the search.
  • Scalability: The system must scale to handle millions of users and high read-heavy traffic.
    • Relevant Knowledge:
      • Horizontal Scaling: Adding more servers to handle the increasing traffic, ensuring the system performs well even during peak hours.
      • Database Sharding: Dividing large datasets into smaller, more manageable pieces stored across multiple servers.
    • Example: Handling a sudden spike in search queries during a major city event.

Step 2: Proposing a High-Level Design

API Design:

APIs follow the RESTful design principles for communication between the client and server. Typical API endpoints:

  • Search API: http GET /v1/search/nearby?latitude=xx&longitude=yy&radius=5000
    • Relevant Knowledge:
      • Rate Limiting: Preventing misuse by limiting the number of API calls per user per minute.
      • Security: Securing API endpoints via OAuth2 or other authentication mechanisms.
  • CRUD APIs for Businesses:
    • GET /v1/businesses/{id} returns detailed information about a specific business.
    • POST /v1/businesses to add a new business.
    • PUT /v1/businesses/{id} to update business details.
    • DELETE /v1/businesses/{id} to remove a business from the listing.
    • Relevant Knowledge:
      • Validation: Ensuring that data sent to the API is valid (e.g., latitude and longitude values should be within proper ranges).
      • API Throttling: To prevent the database from being overwhelmed by frequent business updates.

High-Level System Components:

  1. Load Balancer: Distributes requests evenly across servers to prevent overloading a single server.
    • Relevant Knowledge:
      • Global Load Balancers: Ensuring requests are routed based on the geographical proximity to reduce latency.
      • Sticky Sessions: Keeping users on the same server during the session to minimize state loss.
  2. Location-Based Service (LBS): This service is responsible for querying the database for nearby businesses within the user-defined radius.
    • Relevant Knowledge:
      • Geospatial Queries: Using geospatial databases like PostGIS or Redis Geospatial to efficiently retrieve location data.
      • Real-Time Data: LBS may fetch real-time data from location databases or cache results for quicker retrieval.
  3. Business Service: Manages operations related to adding, updating, or deleting businesses.
    • Relevant Knowledge:
      • Event-Driven Architecture: Updates can be pushed to the system via event queues (e.g., Kafka), ensuring asynchronous updates and scalability.
  4. Database Cluster: Data is stored using a primary-replica architecture. The primary database handles writes, while replicas handle read operations.
    • Relevant Knowledge:
      • Replication Delays: Understanding the slight delay between the primary and replicas, which might cause the latest updates to be briefly unavailable in read queries.

Algorithms for Finding Nearby Businesses

  1. Two-Dimensional Search:
    • Uses a simple circle-based method to find all businesses within the user-defined radius.
    • Relevant Knowledge:
      • Inefficiency for Large Datasets: Scanning all records for each query can be computationally expensive as the number of businesses grows.
      • Bounding Box: Instead of a circle, you can define a bounding box to narrow down the search quickly.
  2. Geospatial Indexing:
    • Used to efficiently organize and query location data.
    Types:
    • Geohash: Converts latitude and longitude into a string for indexing, reducing 2D data to 1D.
      • Relevant Knowledge:
        • Precision Trade-offs: A higher precision geohash (longer string) results in smaller grid sizes, allowing more accurate queries.
      • Example: Searching within a small grid that covers only a few city blocks.
    • Quadtree: Divides the world into four quadrants, dynamically subdividing based on density.
      • Relevant Knowledge:
        • Dynamic Grids: The grid size adjusts based on how dense the area is with businesses—smaller in cities, larger in rural areas.
      • Example: For dense cities like New York, the quadtree can create highly granular grids to optimize queries.

Step 3: Deep Dive – Caching & Database Scaling

Caching:

  • Redis Cache: Frequently accessed queries (like popular searches) are stored in Redis for quick retrieval.
    • Relevant Knowledge:
      • Cache Invalidation: Ensuring that cached data is updated or deleted when a business changes (e.g., when a restaurant closes).
      • TTL (Time To Live): Expiring cached data after a set time to avoid showing stale information.
  • Cache Structure:
    1. Business IDs by Geohash: Caching lists of business IDs based on their geohash for quicker access.
    2. Business Details: Storing detailed business information separately to avoid fetching full data repeatedly.

Database Scaling:

  • Sharding: Splitting large datasets into smaller shards based on business ID or geographic region, allowing for better query performance.
    • Relevant Knowledge:
      • Shard Key Selection: Choosing an appropriate shard key (e.g., location or business ID) to balance the data evenly.
      • Cross-Shard Joins: Handling joins across shards requires more complex logic but helps manage large datasets.
  • Read Replicas: Multiple replicas handle the read-heavy nature of the system, ensuring scalability.
    • Relevant Knowledge:
      • Replication Lag: Read replicas may have slight delays in data updates due to replication delays, but this is acceptable in most proximity services.

Step 4: Final Design & Wrap-Up

The final system design combines geospatial indexing (Geohash or Quadtree), caching (Redis), and database replication to handle millions of users performing location-based searches. By leveraging these technologies, the system ensures scalability, low latency, and data privacy.

Extended Example:

  • A user in Tokyo searches for nearby sushi restaurants within a 3 km radius. The system’s LBS service uses the user’s location and retrieves the results from the Redis cache (if available). If not, it queries the database using geohash indexing, retrieves the results, caches them for future use, and returns the data to the user.

This comprehensive design ensures that the proximity service can scale efficiently while offering low-latency responses and handling real-time data updates responsibly.

By SXStudio

Dr. Shell, Fan of Physics, Computer Science, a Cat Dad and a Soccer Player

Leave a Reply

Your email address will not be published. Required fields are marked *