Databases in System Design

8.1 Relational databases

  • MySQL: It is an open-source Relational Database Management System that stores data in a structured format using rows and columns.  MYSQL language is easy to use as compared to other programming language like C, C++, Java, etc. By learning some basic commands we can work, create and interact with the Database.
  • PostgreSQL: PostgreSQL is an advanced and open-source relational database system and is used as a database for many web applications, mobile and analytics applications. It supports both SQL (relational) and JSON (non-relational) querying
  • SQL Joins are operations used to combine rows from two or more tables based on a related column between them. Common types of joins include:
    • Inner Join: Returns rows when there is a match in both tables.
    • Left (Outer) Join: Returns all rows from the left table and matching rows from the right table.
    • Right (Outer) Join: Returns all rows from the right table and matching rows from the left table.
    • Full (Outer) Join: Returns all rows when there is a match in either table.
    • Cross Join: Returns the Cartesian product of rows from both tables (all possible combinations).
    • Self Join: Joins a table with itself based on a related column.

8.2 Non-relational databases

MongoDB: The most popular NoSQL database, is an open-source document-oriented database. The term ‘NoSQL’ means ‘non-relational’. It means that MongoDB isn’t based on the table-like relational database structure but provides an altogether different mechanism for storage and retrieval of data. This format of storage is called BSON

There are so many databases are available and picking up one database over another is a complicated decision. Well, there is no real formula you can follow but there are a few things you should think about. irstly set aside the idea that you are going to find the one true database that is better than everything else. Now ask a few important questions related to your project:

  • How much data do you expect to store when the application is mature?
  • How many users do you expect to handle simultaneously at peak load?
  • What availability, scalability, latency, throughput, and data consistency does your application need?
  • How often will your database schemas change?
  • What is the geographic distribution of your user population?
  • What is the natural “shape” of your data?
  • Does your application need online transaction processing (OLTP), analytic queries (OLAP), or both?
  • What ratio of reads to writes do you expect in production?
  • What are your preferred programming languages?
  • Do you have a budget? If so, will it cover licenses and support contracts?
  • How strict are you with invalid data being sent to your database? (Ideally, you are very strict and do server-side data validation before persisting it to your database)

Note: Also check SQL vs NoSQL Database

A database schema is a logical representation of data that shows how the data in a database should be stored logically. It shows how the data is organized and the relationship between the tables. Database schema contains table, field, views and relation between different keys like primary keyforeign key.

Data is stored in the form of files which is unstructured in nature which makes accessing the data difficult. Thus to resolve the issue the data are organized in structured way with the help of database schema.

Database Queries: Queries are used to retrieve and manipulate data from databases using SQL or other query languages.

In order to maintain consistency in a database, before and after the transaction, certain properties are followed. These are called ACID properties. 

  • Atomicity: By this, we mean that either the entire transaction takes place at once or doesn’t happen at all. There is no midway i.e. transactions do not occur partially. 
  • Consistency: This means that integrity constraints must be maintained so that the database is consistent before and after the transaction. It refers to the correctness of a database.
  • Isolation: This property ensures that multiple transactions can occur concurrently without leading to the inconsistency of the database state. Transactions occur independently without interference.
  • Durability: This property ensures that once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and they persist even if a system failure occurs.

Sharding

Sharding repre­sents a technique use­d to enhance the scalability and pe­rformance of database manageme­nt for handling large amounts of data. This approach involves fragmenting the extensive datase­t into smaller, self-contained se­gments known as shards. These shards are­ then allocated to separate­ servers or nodes, facilitating paralle­lism in data processing. As a result, query re­sponse times are improve­d, high traffic loads can be accommodated, and bottlene­cks are mitigated.

Partitioning

Partitioning is an optimization technique­ in databases where a single­ table is divided into smaller se­gments called partitions. These­ partitions hold subsets of the table’s data base­d on specific criteria like value­ ranges or categories. This strate­gy enhances query pe­rformance by reducing the amount of scanne­d data, resulting in faster retrie­val times. Furthermore, partitioning simplifie­s maintenance tasks such as backup and indexing since­ they can be focused on individual partitions.

8.7 Database Indexing

Indexing improves database performance by minimizing the number of disc visits required to fulfill a query. It is a data structure technique used to locate and quickly access data in databases. Several database fields are used to generate indexes. The main key or candidate key of the table is duplicated in the first column, which is the Search key.

To speed up data retrieval, the values are also kept in sorted order. It should be highlighted that sorting the data is not required. The second column is the Data Reference or Pointer which contains a set of pointers holding the address of the disk block where that particular key value can be found.

System Design Interview Bootcamp – A Complete Guide

We all know that System Design is the core concept behind the design of any distributed system. Therefore every person in the tech industry needs to have at least a basic understanding of what goes behind designing a System. With this intent, we have brought to you the ultimate System Design Interview Bootcamp, a one-stop solution for learning System Design.

The most important stage in any development process, be it Software or any other tech, is Design. Without the designing phase, you cannot jump to the implementation of the testing part. The same is the case with the System as well.

Important Topics For The System Design Interview Bootcamp

  • 1. System Design Fundamentals
  • 2. Procedure to Design Systems
  • 3. What is High-Level Design?
  • 4. Storage options in System Design
  • 5. Message Queues
  • 6. Types of File Systems
  • 7. System Design Patterns
  • 8. Databases in System Design
  • 9. What is Low-Level Design?
  • 10. What are distributed systems?
  • 11. Distributed System Failures
  • 12. Distributed System Fundamentals
  • 13. UML Diagrams for System Design
  • 14. Scalable web applications
  • 15. Caching
  • 16. Essential Security Measures in System Design
  • 17. Machine Learning and System Design
  • 18. Containerization and System Design
  • 19. The cloud and System Design
  • 20. Interview Guide for System Design

Similar Reads

1. System Design Fundamentals

Let us first begin the System Design Interview Bootcamp with the basics and fundamental terms and concepts used in System Design....

2. Procedure to Design Systems

System design is the process of designing the architecture and components of a software system to meet specific business requirements. The process involves defining the system’s architecture, components, modules, and interfaces, and identifying the technologies and tools that will be used to implement the system. Here are some steps to get started with system design:...

3. What is High-Level Design?

High-level design or HLD refers to the overall system, a design that consists description of the system architecture and design and is a generic system design that includes:...

4. Storage options in System Design

4.1 Block Storage...

5. Message Queues

Message queues facilitate communication between distributed systems by allowing asynchronous communication. This decouples the components, enabling them to operate independently and improving system reliability. It is a form of communication and data transfer mechanism used in computer science and system design. It functions as a temporary storage and routing system for messages exchanged between different components, applications, or systems within a larger software architecture....

6. Types of File Systems

6.1 Google File System(GFS)...

7. Design Patterns in System Design Interview Bootcamp

Design patterns are used to represent some of the best practices adapted by experienced object-oriented software developers. A design pattern systematically names, motivates, and explains a general design that addresses a recurring design problem in object-oriented systems. It describes the problem, the solution, when to apply the solution, and its consequences....

8. Databases in System Design

8.1 Relational databases...

9. What is Low-Level Design?

LLD, as the name suggests, stands for low-level design. It is a component-level design process that follows step by step refinement process. The input to LLD is HLD....

10. What are distributed systems?

Distributed System is a collection of autonomous computer systems that are physically separated but are connected by a centralized computer network that is equipped with distributed system software. The autonomous computers will communicate among each system by sharing resources and files and performing the tasks assigned to them....

11. Distributed System Failures

Method failure:  In this type of failure, the distributed system is generally halted and unable to perform the execution. Sometimes it leads to ending up the execution resulting in an associate incorrect outcome. Method failure causes the system state to deviate from specifications, and also method might fail to progress. System failure: In system failure, the processor associated with the distributed system fails to perform the execution. This is caused by computer code errors and hardware issues. Hardware issues may involve CPU/memory/bus failure. This is assumed that whenever the system stops its execution due to some fault then the interior state is lost. Secondary storage device failure: A storage device failure is claimed to have occurred once the keep information can’t be accessed. This failure is sometimes caused by parity error, head crash, or dirt particles settled on the medium. Communication medium failure: A communication medium failure happens once a web site cannot communicate with another operational site within the network. it’s typically caused by the failure of the shift nodes and/or the links of the human activity system....

12. Distributed System Fundamentals

12.1 MapReduce...

13. UML Diagrams for System Design

Unified Modeling Language (UML) is a general purpose modelling language. The main aim of UML is to define a standard way to visualize the way a system has been designed. It is quite similar to blueprints used in other fields of engineering. UML is not a programming language, it is rather a visual language. We use UML diagrams to portray the behavior and structure of a system...

14. Scalable web applications

14.1 DNS...

15. Caching

Caching is a system design concept that involves storing frequently accessed data in a location that is easily and quickly accessible. The purpose of caching is to improve the performance and efficiency of a system by reducing the amount of time it takes to access frequently accessed data....

16. Essential Security Measures in System Design

In this System Design Interview Bootcamp, ensuring the security of the systems is a top-notch priority. This article will deep into the aspects of why it is necessary to build secure systems and maintain them. With various threats like cyberattacks, Data Breaches, and other Vulnerabilities, it has become very important for system administrators to incorporate robust security measures into their systems....

17. Machine Learning and System Design

System design in machine learning is vital for scalability, performance, and efficiency. It ensures effective data management, model deployment, monitoring, and resource optimization, while also addressing security, privacy, and regulatory compliance. A well-designed system enables seamless integration, adaptability, cost control, and collaborative development, ultimately making machine learning solutions robust, reliable, and capable of real-world deployment....

18. Containerization and System Design

Containerization is a lightweight form of virtualization that allows applications and their dependencies to be packaged and run consistently across different computing environments. Containers encapsulate an application, its runtime, libraries, and other dependencies, ensuring that it runs consistently regardless of the environment in which it is deployed. Docker, a widely used containerization platform, popularized this approach, but other containerization technologies exist, such as containerd and Podman....

19. The cloud and System Design

...

20. Interview Guide for System Design

Follow these links for cracking the system design interviews:...