Databases in System Design
8.1 Relational databases
- MySQL: It is an open-source Relational Database Management System that stores data in a structured format using rows and columns. MYSQL language is easy to use as compared to other programming language like C, C++, Java, etc. By learning some basic commands we can work, create and interact with the Database.
- PostgreSQL: PostgreSQL is an advanced and open-source relational database system and is used as a database for many web applications, mobile and analytics applications. It supports both SQL (relational) and JSON (non-relational) querying
- SQL Joins are operations used to combine rows from two or more tables based on a related column between them. Common types of joins include:
- Inner Join: Returns rows when there is a match in both tables.
- Left (Outer) Join: Returns all rows from the left table and matching rows from the right table.
- Right (Outer) Join: Returns all rows from the right table and matching rows from the left table.
- Full (Outer) Join: Returns all rows when there is a match in either table.
- Cross Join: Returns the Cartesian product of rows from both tables (all possible combinations).
- Self Join: Joins a table with itself based on a related column.
8.2 Non-relational databases
MongoDB: The most popular NoSQL database, is an open-source document-oriented database. The term ‘NoSQL’ means ‘non-relational’. It means that MongoDB isn’t based on the table-like relational database structure but provides an altogether different mechanism for storage and retrieval of data. This format of storage is called BSON
There are so many databases are available and picking up one database over another is a complicated decision. Well, there is no real formula you can follow but there are a few things you should think about. irstly set aside the idea that you are going to find the one true database that is better than everything else. Now ask a few important questions related to your project:
- How much data do you expect to store when the application is mature?
- How many users do you expect to handle simultaneously at peak load?
- What availability, scalability, latency, throughput, and data consistency does your application need?
- How often will your database schemas change?
- What is the geographic distribution of your user population?
- What is the natural “shape” of your data?
- Does your application need online transaction processing (OLTP), analytic queries (OLAP), or both?
- What ratio of reads to writes do you expect in production?
- What are your preferred programming languages?
- Do you have a budget? If so, will it cover licenses and support contracts?
- How strict are you with invalid data being sent to your database? (Ideally, you are very strict and do server-side data validation before persisting it to your database)
Note: Also check SQL vs NoSQL Database
8.4 Database Schemas
A database schema is a logical representation of data that shows how the data in a database should be stored logically. It shows how the data is organized and the relationship between the tables. Database schema contains table, field, views and relation between different keys like primary key, foreign key.
Data is stored in the form of files which is unstructured in nature which makes accessing the data difficult. Thus to resolve the issue the data are organized in structured way with the help of database schema.
Database Queries: Queries are used to retrieve and manipulate data from databases using SQL or other query languages.
8.5 ACID Properties
In order to maintain consistency in a database, before and after the transaction, certain properties are followed. These are called ACID properties.
- Atomicity: By this, we mean that either the entire transaction takes place at once or doesn’t happen at all. There is no midway i.e. transactions do not occur partially.
- Consistency: This means that integrity constraints must be maintained so that the database is consistent before and after the transaction. It refers to the correctness of a database.
- Isolation: This property ensures that multiple transactions can occur concurrently without leading to the inconsistency of the database state. Transactions occur independently without interference.
- Durability: This property ensures that once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and they persist even if a system failure occurs.
Sharding
Sharding represents a technique used to enhance the scalability and performance of database management for handling large amounts of data. This approach involves fragmenting the extensive dataset into smaller, self-contained segments known as shards. These shards are then allocated to separate servers or nodes, facilitating parallelism in data processing. As a result, query response times are improved, high traffic loads can be accommodated, and bottlenecks are mitigated.
Partitioning
Partitioning is an optimization technique in databases where a single table is divided into smaller segments called partitions. These partitions hold subsets of the table’s data based on specific criteria like value ranges or categories. This strategy enhances query performance by reducing the amount of scanned data, resulting in faster retrieval times. Furthermore, partitioning simplifies maintenance tasks such as backup and indexing since they can be focused on individual partitions.
8.7 Database Indexing
Indexing improves database performance by minimizing the number of disc visits required to fulfill a query. It is a data structure technique used to locate and quickly access data in databases. Several database fields are used to generate indexes. The main key or candidate key of the table is duplicated in the first column, which is the Search key.
To speed up data retrieval, the values are also kept in sorted order. It should be highlighted that sorting the data is not required. The second column is the Data Reference or Pointer which contains a set of pointers holding the address of the disk block where that particular key value can be found.
System Design Interview Bootcamp – A Complete Guide
We all know that System Design is the core concept behind the design of any distributed system. Therefore every person in the tech industry needs to have at least a basic understanding of what goes behind designing a System. With this intent, we have brought to you the ultimate System Design Interview Bootcamp, a one-stop solution for learning System Design.
The most important stage in any development process, be it Software or any other tech, is Design. Without the designing phase, you cannot jump to the implementation of the testing part. The same is the case with the System as well.
Important Topics For The System Design Interview Bootcamp
- 1. System Design Fundamentals
- 2. Procedure to Design Systems
- 3. What is High-Level Design?
- 4. Storage options in System Design
- 5. Message Queues
- 6. Types of File Systems
- 7. System Design Patterns
- 8. Databases in System Design
- 9. What is Low-Level Design?
- 10. What are distributed systems?
- 11. Distributed System Failures
- 12. Distributed System Fundamentals
- 13. UML Diagrams for System Design
- 14. Scalable web applications
- 15. Caching
- 16. Essential Security Measures in System Design
- 17. Machine Learning and System Design
- 18. Containerization and System Design
- 19. The cloud and System Design
- 20. Interview Guide for System Design