Data storage/retrieval
Database schema before class: database_day17_end_of_class.sql
Slides from class
Database notes
Data file
A named physical storage space that stores a database's data. It can reside in a different directory on one or more storage locations. All data in a database is stored in data files. A typical database is stored in several data files, normally with one file per relation.
Index
An ordered array of index key values and row ID values (pointers). Indexes are generally used to speed up and facilitate data retrieval.
Search key
An attribute or set of attributes used to look up records in a file. Note that this definition of key differs from the those used in primary key, candidate key, and superkey.
Composite search key
A search key containing more than one attribute.
Clustered (primary) index
Records stored in sorted order according to the search key.
Non-clustered (secondary) index
Indices whose search key specifies an order different from the sequential order of the file.
Index entry
A search key value and pointers to one or more records with that value as their search key value. Pointers identify disk block and offset within the block that identify the location of that record on disk.
Dense index
An entry appears for every search key value in the file. If there are multiple entries for the same search key, each index entry contains a pointer to the first data record with that search key. The records are stored sequentially on disk based on the search key, so additional records with the same search key value can be easily found by sequentially reading index entries.
Sparse index
An index entry appears for only some of the search key values. Sparse indices can be used only if the relation is stored in store order of the search key; that is if the index is a clustered index.
Multilevel index
Indices with two or more levels.
B+ tree index
The most widely used index structure. Takes the form of a balanced tree (hence the B in B+ tree) in which every path from root to leaf is the same length. Each nonleaf node in the tree (other than the root) has between m/2 and m children where m is a fixed values for a particular tree. The nonleaf nodes form a sparse multilevel index on the leaf nodes. Each leaf node maintains a pointer to the next leaf node, ordered from smallest to largest search key value. The size of m is normally chosen based on how many index entries fit on a disk block (disk block sizes are normally 4Kb, 8Kb, or 16Kb depending on the system). B+ trees are a more generalized version of 2-3-4 trees from CS10.
Note: many definitions from Coronel and Morris.