SXStudio

System Design Reading Notes 14: Design Google Drive

This is my reading notes for Chapter 15 in book “System Design Interview – An insider’s guide (Vol. 1)”.

Google Drive is a cloud storage and file synchronization service that enables users to upload, store, and share files across devices and access them from anywhere. The design focuses on building a system that can efficiently handle large-scale storage, fast file synchronization, file sharing, and version control. Below is a comprehensive breakdown of how to approach designing such a system.

Design Google Drive

1. Understanding the Problem and Requirements

Key Functional Features:

  1. Upload and download files:
    • Users can upload various file types and download them later.
    • Examples: Uploading a 100 MB video file, downloading a 2 MB PDF from Google Drive.
  2. File synchronization:
    • Files need to be synchronized across all devices (laptops, mobile, tablets) as soon as there’s a change.
    • Example: A user edits a file on their laptop, and the changes automatically appear on their mobile phone within seconds.
  3. File sharing:
    • Users can share files or folders with others and assign different permissions (e.g., view-only, edit).
    • Example: Sharing a Google Docs file with colleagues, allowing them to comment but not edit.
  4. File versioning:
    • Keeping track of changes made to files and allowing users to access previous versions.
    • Example: Recovering an older version of a spreadsheet from two weeks ago.

Non-functional Requirements:

  1. High reliability and durability:
    • Data must be available and protected from loss, even during failures. A 99.999999999% durability is expected for file storage.
    • Example: If a server in one region fails, data is still available due to replication across other regions.
  2. Fast synchronization and low latency:
    • Syncing should be nearly real-time for small files and changes.
    • Example: A change made on a text document should reflect on another device in less than a second.
  3. Scalability:
    • The system must scale to handle millions of users and petabytes of data.
    • Example: Google Drive has over 1 billion users and needs to handle high concurrency and traffic spikes.
  4. Cost-efficiency:
    • The design should minimize costs related to bandwidth, storage, and CPU processing.

2. Back-of-the-Envelope Estimations

To understand the scale of the system:


3. Proposed High-Level Design

Google Drive requires a distributed, scalable architecture to handle millions of users and high traffic volumes. The design can be broken down into several core components:

File Storage and Metadata Separation:

File Chunking and Block Servers:

Delta Syncing:


4. Core Components

  1. Block Servers:
    • Handle the splitting and uploading of files into blocks.
    • Example: User uploads a 500 MB video. The block server splits it into 125 chunks (500 MB ÷ 4 MB = 125 chunks) for easier storage and faster parallel uploads.
  2. Cloud Storage (S3, GCS):
    • Stores file chunks. It offers reliability through replication across multiple data centers.
    • Example: Each file chunk is stored in three regions, ensuring that even if one region fails, data is still accessible.
  3. API Servers:
    • Manage user requests like uploading, downloading, and sharing files.
    • Example: When a user requests to share a file, the API server processes the request and updates the metadata with the sharing permissions.
  4. Metadata Storage and Caching:
    • Metadata, such as file paths, ownership, and permissions, is stored in a SQL database and cached for faster access.
    • Example: When retrieving a file, the system first queries the metadata to understand where the file is stored, what permissions are available, etc.
  5. Notification Service:
    • Ensures clients are notified when files are updated, shared, or deleted. This can be done using long polling, WebSocket, or push notifications.
    • Example: User A uploads a new version of a document. User B, who has shared access, receives a notification that the file has been updated.

5. Handling Sync Conflicts


6. Data Flow

Upload Flow:

  1. User uploads a file, which is split into chunks by the block server.
  2. Each chunk is compressed, encrypted, and sent to cloud storage (e.g., S3 or GCS).
  3. Once uploaded, the cloud storage sends an acknowledgment, which updates the metadata database and notifies the user of a successful upload.

Download Flow:

  1. The user requests to download a file, and the metadata server is queried to find the file’s location.
  2. The file chunks are retrieved from cloud storage and reconstructed into the full file.
  3. The file is then delivered to the user.

7. Failure Handling and Replication

The system must handle various failure scenarios to ensure high availability:


Conclusion:

Exit mobile version