Database Design

Database design Goals: The prime goal of designing a database is:

  • To have zero redundancy in the system
  • Loss-less join
  • Dependency preservation
  • Overcome all the shortcomings of conventional file system

According to E.F. Codd, “All the records of the table must be unique”. 

Keys of a relation: There are various types of keys in a relation which are:

  • Candidate Key: The minimal set of attributes that can determine a tuple uniquely. There can be more than 1 candidate key of a relation and its proper subset can’t determine tuple uniquely and it can’t be NULL.
  • Super Key: The set of attributes that can determine a tuple uniquely. A candidate key is always a super key but vice versa is not true. 
  • Primary Key and Alternate Key: Among various candidate keys, one key is taken as the primary key and others are alternate keys.
  • Foreign Key: Foreign Key is a set of attributes in a table that is used to refer to the primary key or alternative key of the same or another table. 

Functional Dependency: It is a constraint that specifies the association/ relationship between a set of attributes. In functional dependency, one set can accurately determine the value of another set.  It is represented as A->B, where set A can determine the values of set B correctly. The A is known as the Determinant, and B is known as the Dependent. 

Functional dependencies are further categorized into two types:

  • Trival Functional Dependency: In functional dependency, if B is a subset of A, then such dependency is known as trivial functional dependency.
  • Non-Trivial Functional Dependency: In functional dependency, if B is not a subset of A, then such dependency is known as non-trivial functional dependency.

Armstrong’s Axioms: It is a statement that is always considered true and used as a starting point for further arguments. Armstrong axiom is used to generate a closure set in a relational database. 

Armstrong Axiom

Attribute Closure(X+): All attributes of the set are functionally determined by X. 

  • Prime Attribute: An attribute that is part of one candidate key. 
  • Non-prime Attribute: An attribute that is not a part of any candidate key. 

Example: If the relation R(ABCD) {A->B, B->C, C->D}, then the attribute closure of 

A will be (A+)={ABCD} [A can determine B, B can determine C, C can determine D]
B will be (B+)={BCD}    [B can determine C, C can determine D]
C will be (C+)={CD}      [C can determine D]      
D will be (D+)={D}        [D can determine itself]    

Note: With the help of Attribute closure, we can easily determine the Superkey [The set of attributes whose closure contains all attributes of a relation] of a relation, So in the above example A is the superkey of the given relation.  There can be more than one superkey in a relationship. 

Example: If the relation R(ABCDE) {A->BC, CD->E, B->D, E->A}, then the attribute closure will be

A+= {ABCDE}
B+= {BD}
C+= {C}
D+= {D}
E+= {ABCDE}

Equivalence sets of Functional Dependency: If two sets of a functional dependency are equivalent, i.e. if A+= B+. Every FD in A can be inferred from B, and every FD in B can be inferred from A, then A and B are functionally equivalent. 

Minimal set of Functional Dependency: A set of FD  will be minimal if it satisfies the following conditions:

  • F logically implies all dependencies in F+
  • F+ logically implies all dependencies in F.
  • No functional dependency in F+ contains an extraneous attribute.
  • Each left side of a functional dependency in F+ is unique. That is, there are no two dependencies A->B and C->D in such that A->C.

Normalization: Normalization is used to eliminate the following anomalies:

  • Insertion anomaly
  • Deletion anomaly
  • Updation Anomaly
  • Join anomaly

Normalization was introduced to achieve integrity in the database and make the database more maintainable.

1. First Normal Form: A relation is in first normal form if it does not contain any multi-valued or composite attribute.  If the data is in 1NF then it will have high redundancy. 

2. Second Normal Form: A relation is in the second normal form if it is in the first normal form and if it does not contain any partial dependency.

  • Partial Dependency: A dependency is called partial dependency if any proper subset of candidate key determines non-prime (which are not part of candidate key) attribute.
    Let R be the relational schema and X, Y, A is the set of attributes. Suppose X is any candidate key, Y is a proper subset of candidate key, and A is a Non-prime attribute.
     

Partial Dependency

           Y->A will be partial dependency iff, Y is a proper subset of candidate key, and A is a non-prime attribute. 

  • Full Functional Dependency: If A and B are an attribute set of a relation, B is fully functional dependent on A, if B is functionally dependent on A but not on any proper subset of A.

3. Third Normal Form: A relation is in the third normal form if it is in the second normal form and it does not contain any transitive dependency. For a relation to be in Third Normal Form, either LHS of FD should be super key or RHS should be the prime attribute.

4. Boyce-Codd Normal Form: A relation is in Boyce-Codd Normal Form if the LHS of every FD is super key. The relationship between Normal Forms can be represented as 1NF, 2NF, 3NF or BCNF.

Design Goal1NF2NF3NFBCNF
Zero RedundancyHigh redundancyLess than 1NFLess than 2NFNo redundancy
Loss-less decompositionAlwaysAlwaysAlwaysAlways
Dependency preservationAlwaysAlwaysAlwaysSometimes Not possible

Properties of Decomposition:

  • Loss-less Join Decomposition: There should not be the generation of any new tuple because of the decomposition. 
  • Dependency Preserving Decomposition: There should not be the loss of any tuple because of the decomposition. Let R be a relation with Functional dependency F. After decomposition R is decomposed into R1, R2, R3……Rn with FD set F1, F2, F3……Fn respectively. 

If F1, F2, F3…..Fn ≣ F, then the decomposition is dependency preserving otherwise not. 

Last Minute Notes – DBMS

Database Management System is an organized collection of interrelated data that helps in accessing data quickly, along with efficient insertion, and deletion of data into the DBMS. DBMS organizes data in the form of tables, schemas, records, etc. 

DBMS over File System

The file system has numerous issues, which were resolved with the help of DBMS, the issues with the file system are:

  • Data Redundancy: Same data can be stored at multiple places. 
  • Data Inconsistency: If multiple copies of the same data have different content in each copy. Like, the phone number of students is different in academic and accounts files. 
  • Data access: In a file system, accessing data was difficult and insecure as well. Accessing data concurrently was not possible.
  • No Backup and Recovery: There is no backup and recovery in the file system that can lead to data loss.

Similar Reads

ER-Model

ER Diagram: An ER diagram is a model of a logical view of the database which is represented using the following components:...

Database Design

Database design Goals: The prime goal of designing a database is:...

Data Retrieval (SQL, RA)

Commands to Access Database: For efficient data retrieval, insertion, deletion, updation, etc. The commands in the Database are categorized into three categories, which are as follows:...

File Structure

File organization: It is the logical relation between records and it defines how file records are mapped into disk blocks(memory). A database is a collection of files, each file is a collection of records, and each record contains a sequence of fields. The blocking Factor is the average number of records per block....

Indexing Type:

1. Single level Index...

Transaction and Concurrency Control

A transaction is a unit of instruction or set of instructions that performs a logical unit of work. Transaction processes are always atomic in nature either they will execute completely or do not execute....