Horizontal or Range Based Sharding

  • In this method, we split the data based on the ranges of a given value inherent in each entity.
  • Let’s say you have a database of your online customers’ names and email information.
  • You can split this information into two shards. In one shard you can keep the info of customers whose first name starts with A-P and in another shard, keep the information of the rest of the customers. 

2.2.1 Advantages of Range Based Sharding:

  • Scalability:
    • Horizontal or range-based sharding allows for seamless scalability by distributing data across multiple shards, accommodating growing datasets.
  • Improved Performance:
    • Data distribution among shards enhances query performance through parallelization, ensuring faster operations with smaller subsets of data handled by each shard.

2.2.2 Disadvantages of Range Based Sharding:

  • Complex Querying Across Shards:
    • Coordinating queries involving multiple shards can be challenging.
  • Uneven Data Distribution:
    • Poorly managed data distribution may lead to uneven workloads among shards.

Database Sharding | System Design

Database sharding is a technique for horizontal scaling of databases, where the data is split across multiple database instances, or shards, to improve performance and reduce the impact of large amounts of data on a single database.

Important Topics for the Database Sharding

  • What is Sharding or Data Partitioning?
  • Sharding Architectures
    • Key Based Sharding
    • Horizontal or Range Based Sharding 
    • Vertical Sharding
    • Directory-Based Sharding
  • Advantages of Sharding in System Design
  • Disadvantages of Sharding in System Design
  • Conclusion

When designing a sharded database, the following key considerations should be taken into account:

  • Data distribution: How the data will be split across the shards, either based on a specific key such as the user ID or by using a hash function.
  • Shard rebalancing: How the data will be balanced across the shards as the amount of data changes over time.
  • Query routing: How queries will be directed to the correct shard, either by using a dedicated routing layer or by including the shard information in the query.
  • Data consistency: How data consistency will be maintained across the shards, for example by using transaction logs or by employing a distributed database system.
  • Failure handling: How the system will handle the failure of one or more shards, including data recovery and data redistribution.
  • Performance: How the sharded database will perform in terms of read and write speed, as well as overall system performance and scalability.

In summary, Database Sharding is a complex but important concept in system design that can help to improve the scalability and performance of a database-driven system. A strong understanding of database sharding is often viewed as a key requirement for successful system design.

Similar Reads

1. What is Sharding or Data Partitioning?

Let’s understand sharding with the help of an example:...

2. Sharding Architectures

2.1. Key Based Sharding...

2.1. Key Based Sharding

This technique is also known as hash-based sharding. Here, we take the value of an entity such as customer ID, customer email, IP address of a client, zip code, etc and we use this value as an input of the hash function. This process generates a hash value which is used to determine which shard we need to use to store the data. We need to keep in mind that the values entered into the hash function should all come from the same column (shard key) just to ensure that data is placed in the correct order and in a consistent manner. Basically, shard keys act like a primary key or a unique identifier for individual rows....

2.2. Horizontal or Range Based Sharding

In this method, we split the data based on the ranges of a given value inherent in each entity. Let’s say you have a database of your online customers’ names and email information. You can split this information into two shards. In one shard you can keep the info of customers whose first name starts with A-P and in another shard, keep the information of the rest of the customers....

2.3. Vertical Sharding

In this method, we split the entire column from the table and we put those columns into new distinct tables. Data is totally independent of one partition to the other ones. Also, each partition holds both distinct rows and columns. We can split different features of an entity in different shards on different machines....

2.4. Directory-Based Sharding

In this method, we create and maintain a lookup service or lookup table for the original database. Basically we use a shard key for lookup table and we do mapping for each entity that exists in the database. This way we keep track of which database shards hold which data....

3. Advantages of Sharding in System Design

Solve Scalability Issue: With a single database server architecture any application experience performance degradation when users start growing on that application.   Reads and write queries become slower and the network bandwidth starts to saturate. Database sharding fixes all these issues by partitioning the data across multiple machines. High Availability: A problem with single server architecture is that if an outage happens then the entire application will be unavailable which is not good for a website. Whereas, If an outage happens in sharded architecture, then only some specific shards will be down. All the other shards will continue the operation and the entire application won’t be unavailable for the users. Speed Up Query Response Time: When you submit a query in an application with a large monolithic database and have no sharded architecture, it takes more time to find the result. It has to search every row in the table and that slows down the response time for the query. In a sharded database a query has to go through fewer rows and you receive the response in less time. More Write Bandwidth: For many applications writing is a major bottleneck. With no master database serializing writes sharded architecture allows you to write in parallel and increase your write throughput. Scaling Out: Sharding a database facilitates horizontal scaling, known as scaling out. In horizontal scaling, you add more machines in the network and distribute the load on these machines for faster processing and response....

4. Disadvantages of Sharding in System Design

Adds Complexity in the System: You need to be careful while implementing a proper sharded database architecture in an application. It’s a complicated task and if it’s not implemented properly then you may lose the data or get corrupted tables in your database. You also need to manage the data from multiple shard locations, This may affect the workflow of your team Rebalancing Data: Sometimes shards become unbalanced (when a shard outgrows other shards). Consider an example that you have two shards of a database: One shard store the name of the customers begins with letter A through M. Another shard store the name of the customer begins with the letters N through Z. If there are so many users with the letter L then shard one will have more data than shard two. This will affect the performance (slow down) of the application and it will stall out for a significant portion of your users. The A-M shard will become unbalance and it will be known as database hotspot. To overcome this problem and to rebalance the data you need to do re-sharding for even data distribution. Joining Data From Multiple Shards is Expensive: In a single database, joins can be performed easily to implement any functionalities. But in sharded architecture, you need to pull the data from different shards and you need to perform joins across multiple networked servers and You can’t submit a single query to get the data from various shards. You need to submit multiple queries for each one of the shards, It adds latency to your system. No Native Support: Sharding is not natively supported by every database engine. For example, PostgreSQL doesn’t include automatic sharding features, so there you have to do manual sharding. You need to follow the “roll-your-own” approach. It will be difficult for you to find the tips or documentation for sharding and troubleshoot the problem during the implementation of sharding....

5. Conclusion

Sharding is a great solution when the single database of your application is not capable to handle/store a huge amount of growing data. Sharding helps to scale the database and improve the performance of the application. However, it also adds some complexity to your system. The above methods and architectures have clearly shown the benefits and drawbacks of each sharding technique....