Horizontal or Range Based Sharding
- In this method, we split the data based on the ranges of a given value inherent in each entity.
- Let’s say you have a database of your online customers’ names and email information.
- You can split this information into two shards. In one shard you can keep the info of customers whose first name starts with A-P and in another shard, keep the information of the rest of the customers.
2.2.1 Advantages of Range Based Sharding:
- Scalability:
- Horizontal or range-based sharding allows for seamless scalability by distributing data across multiple shards, accommodating growing datasets.
- Improved Performance:
- Data distribution among shards enhances query performance through parallelization, ensuring faster operations with smaller subsets of data handled by each shard.
2.2.2 Disadvantages of Range Based Sharding:
- Complex Querying Across Shards:
- Coordinating queries involving multiple shards can be challenging.
- Uneven Data Distribution:
- Poorly managed data distribution may lead to uneven workloads among shards.
Database Sharding | System Design
Database sharding is a technique for horizontal scaling of databases, where the data is split across multiple database instances, or shards, to improve performance and reduce the impact of large amounts of data on a single database.
Important Topics for the Database Sharding
- What is Sharding or Data Partitioning?
- Sharding Architectures
- Key Based Sharding
- Horizontal or Range Based Sharding
- Vertical Sharding
- Directory-Based Sharding
- Advantages of Sharding in System Design
- Disadvantages of Sharding in System Design
- Conclusion
When designing a sharded database, the following key considerations should be taken into account:
- Data distribution: How the data will be split across the shards, either based on a specific key such as the user ID or by using a hash function.
- Shard rebalancing: How the data will be balanced across the shards as the amount of data changes over time.
- Query routing: How queries will be directed to the correct shard, either by using a dedicated routing layer or by including the shard information in the query.
- Data consistency: How data consistency will be maintained across the shards, for example by using transaction logs or by employing a distributed database system.
- Failure handling: How the system will handle the failure of one or more shards, including data recovery and data redistribution.
- Performance: How the sharded database will perform in terms of read and write speed, as well as overall system performance and scalability.
In summary, Database Sharding is a complex but important concept in system design that can help to improve the scalability and performance of a database-driven system. A strong understanding of database sharding is often viewed as a key requirement for successful system design.