Microservices Used for Twitter System Design
9.1 Data Partitioning
To scale out our databases we will need to partition our data. Horizontal partitioning (aka Sharding) can be a good first step. We can use partitions schemes such as:
- Hash-Based Partitioning
- List-Based Partitioning
- Range Based Partitioning
- Composite Partitioning
The above approaches can still cause uneven data and load distribution, we can solve this using Consistent hashing.
9.2 Mutual friends
- For mutual friends, we can build a social graph for every user. Each node in the graph will represent a user and a directional edge will represent followers and followees.
- After that, we can traverse the followers of a user to find and suggest a mutual friend. This would require a graph database such as Neo4j and ArangoDB.
- This is a pretty simple algorithm, to improve our suggestion accuracy, we will need to incorporate a recommendation model which uses machine learning as part of our algorithm.
9.3 Metrics and Analytics
- Recording analytics and metrics is one of our extended requirements.
- As we will be using Apache Kafka to publish all sorts of events, we can process these events and run analytics on the data using Apache Spark which is an open-source unified analytics engine for large-scale data processing.
9.4 Caching
- In a social media application, we have to be careful about using cache as our users expect the latest data. So, to prevent usage spikes from our resources we can cache the top 20% of the tweets.
- To further improve efficiency we can add pagination to our system APIs. This decision will be helpful for users with limited network bandwidth as they won’t have to retrieve old messages unless requested.
9.5 Media access and storage
- As we know, most of our storage space will be used for storing media files such as images, videos, or other files. Our media service will be handling both access and storage of the user media files.
- But where can we store files at scale? Well, object storage is what we’re looking for. Object stores break data files up into pieces called objects.
- It then stores those objects in a single repository, which can be spread out across multiple networked systems. We can also use distributed file storage such as HDFS or GlusterFS.
9.6 Content Delivery Network (CDN)
- Content Delivery Network (CDN) increases content availability and redundancy while reducing bandwidth costs.
- Generally, static files such as images, and videos are served from CDN. We can use services like Amazon CloudFront or Cloudflare CDN for this use case.
Designing Twitter – A System Design Interview Question
Designing Twitter (or Facebook feed or Facebook search..) is a quite common question that interviewers ask candidates. A lot of candidates get afraid of this round more than the coding round because they don’t get an idea of what topics and tradeoffs they should cover within this limited timeframe.
Important Topics for Designing Twitter
- How Would You Design Twitter?
- Requirements for Twitter System Design
- Capacity Estimation for Twitter System Design
- Use Case Design for Twitter System Design
- Low Level Design for Twitter System Design
- High Level Design for Twitter System Design
- Data Model Design for Twitter System Design
- API Design for Twitter System Design
- Microservices Used for Twitter System Design
- Scalability for Twitter System Design