Disadvantages of Primary Indexing
- Primary indexing requires additional space to store the index field alongside the actual data records. This extra storage can impact overall system costs, especially for large datasets.
- When records are added or deleted, the data file needs reorganization to maintain the order based on the primary key.
- After record deletion, the space used by the deleted record must be released for reuse by other records. Managing this memory cleanup can be complex.
Question for Practice
Question: Consider a database of fixed-length records, stored as an ordered file. The database has 25,000 records, with each record being 100 bytes, of which the primary key occupies 15 bytes. The data file is block-aligned in that each data record is fully contained within a block. The database is indexed by a primary index file, which is also stored as a block-aligned ordered file. The figure below depicts this indexing scheme.
Suppose the block size of the file system is 1024 bytes, and a pointer to a block occupies 5 bytes. The system uses binary search on the index file to search for a record with a given key. You may assume that a binary search on an index file of b blocks takes ⌈log2b⌉ block accesses in the worst case. Given a key, the number of block accesses required to identify the block in the data file that may contain a record with the key, in the worst case, is _______. [ GATE CS/IT 2023 ]
Solution:
As it is given it is fixes length record,
So one block should fully contain within block so here it means
that they are saying that you should use un-spanned file organization.
=> No. of Records(Rn) = 25,000
=> Size of Record(Rs) = 100 Bytes
=> Size of Key(Ks) = 15 Bytes
=> Block Pointer Size(Ps) = 5 Bytes
So , Index Size(Is) = Size of Key + Block Pointer Size = 15 + 5 = 20 Bytes
Disk Block Size(Bs) = 1024 Bytes
=> No. of Record per Block = ⌊(Disk Block Size/Record Size)⌋
= ⌊1024⌋ = ⌊10.24⌋ = 10
=> No. of Block Required for record = ⌈( no. of record / no of record per block)⌉
= ⌈ 25000/10 ⌉ = 2500
=> The database is indexed by a primary index file it means,
that here we have to use sparse indexing.
=> No. of Index per Block =⌊(Disk Block Size/Index Size)⌋
= ⌊ (1024/20) ⌋
= 51 index per blocks
=> No. of Block Required for indexes = ⌈ ( No. of index /No. of index per block) ⌉
= ⌈ 2500/51 ⌉
= 50
=> It is given that Binary Search is used to search the desired block
and we know that it takes (log n) time to search in worst case
where n is the no. of comparison required
=> Here No. of comparison = No. of Index Block
=> Hence the number of block accesses required to identify the block in
the data file that may contain a record with the key,
in the worst case, is (log 50) = 6 Blocks
Primary Indexing in Databases
Indexing is a technique used to reduce access cost or I/O cost, now the question arrives what is access cost? Access cost is defined as the number of secondary memory blocks which is transferred from secondary memory to main memory in order to access required data. In this article, we are going to discuss every point about primary indexing.