Microservices Used for Twitter System Design

9.1 Data Partitioning

To scale out our databases we will need to partition our data. Horizontal partitioning (aka Sharding) can be a good first step. We can use partitions schemes such as:

  • Hash-Based Partitioning
  • List-Based Partitioning
  • Range Based Partitioning
  • Composite Partitioning

The above approaches can still cause uneven data and load distribution, we can solve this using Consistent hashing.

9.2 Mutual friends

  • For mutual friends, we can build a social graph for every user. Each node in the graph will represent a user and a directional edge will represent followers and followees.
  • After that, we can traverse the followers of a user to find and suggest a mutual friend. This would require a graph database such as Neo4j and ArangoDB.
  • This is a pretty simple algorithm, to improve our suggestion accuracy, we will need to incorporate a recommendation model which uses machine learning as part of our algorithm.

9.3 Metrics and Analytics

  • Recording analytics and metrics is one of our extended requirements.
  • As we will be using Apache Kafka to publish all sorts of events, we can process these events and run analytics on the data using Apache Spark which is an open-source unified analytics engine for large-scale data processing.

9.4 Caching

  • In a social media application, we have to be careful about using cache as our users expect the latest data. So, to prevent usage spikes from our resources we can cache the top 20% of the tweets.
  • To further improve efficiency we can add pagination to our system APIs. This decision will be helpful for users with limited network bandwidth as they won’t have to retrieve old messages unless requested.

9.5 Media access and storage

  • As we know, most of our storage space will be used for storing media files such as images, videos, or other files. Our media service will be handling both access and storage of the user media files.
  • But where can we store files at scale? Well, object storage is what we’re looking for. Object stores break data files up into pieces called objects.
  • It then stores those objects in a single repository, which can be spread out across multiple networked systems. We can also use distributed file storage such as HDFS or GlusterFS.

9.6 Content Delivery Network (CDN)

  • Content Delivery Network (CDN) increases content availability and redundancy while reducing bandwidth costs.
  • Generally, static files such as images, and videos are served from CDN. We can use services like Amazon CloudFront or Cloudflare CDN for this use case.

Designing Twitter – A System Design Interview Question

Designing Twitter (or Facebook feed or Facebook search..) is a quite common question that interviewers ask candidates. A lot of candidates get afraid of this round more than the coding round because they don’t get an idea of what topics and tradeoffs they should cover within this limited timeframe.

Important Topics for Designing Twitter

  • How Would You Design Twitter?
  • Requirements for Twitter System Design
  • Capacity Estimation for Twitter System Design
  • Use Case Design for Twitter System Design
  • Low Level Design for Twitter System Design
  • High Level Design for Twitter System Design
  • Data Model Design for Twitter System Design
  • API Design for Twitter System Design
  • Microservices Used for Twitter System Design
  • Scalability for Twitter System Design

Similar Reads

1. How Would You Design Twitter?

Don’t jump into the technical details immediately when you are asked this question in your interviews. Do not run in one direction, it will just create confusion between you and the interviewer. Most of the candidates make mistakes here and immediately they start listing out some bunch of tools or frameworks like MongoDB, Bootstrap, MapReduce, etc....

2. Requirements for Twitter System Design

2.1 Functional Requirements:...

3. Capacity Estimation for Twitter System Design

To estimate the system’s capacity, we need to analyze the expected daily click rate....

4. Use Case Design for Twitter System Design

...

5. Low Level Design for Twitter System Design

A low-level design of Twitter dives into the details of individual components and functionalities. Here’s a breakdown of some key aspects:...

6. High Level Design for Twitter System Design

We will discuss about high level design for twitter,...

7. Data Model Design for Twitter System Design

This is the general Dara model which reflects our requirements....

8. API Design for Twitter System Design

A basic API design for our services:...

9. Microservices Used for Twitter System Design

9.1 Data Partitioning...

10. Scalability for Twitter System Design

Let us identify and resolve Scalability such as single points of failure in our design:...

11. Conclusion

Twitter handles thousands of tweets per second so you can’t have just one big system or table to handle all the data so it should be handled through a distributed approach. Twitter uses the strategy of scatter and gather where it set up multiple servers or data centers that allow indexing. When Twitter gets a query (let’s say #geeksforgeeks) it sends the query to all the servers or data centers and it queries every Early Bird shard. All the early bird that matches with the query return the result. The results are returned, sorted, merged, and reranked. The ranking is done based on the number of retweets, replies, and the popularity of the tweets....