Data storage/retrieval

Database schema before class: database_day17_end_of_class.sql
Slides from class

Database notes

Data file

A named physical storage space that stores a database's data. It can reside in a different directory on one or more storage locations. All data in a database is stored in data files. A typical database is stored in several data files, normally with one file per relation.

Index

An ordered array of index key values and row ID values (pointers). Indexes are generally used to speed up and facilitate data retrieval.

Search key

An attribute or set of attributes used to look up records in a file. Note that this definition of key differs from the those used in primary key, candidate key, and superkey.

Composite search key

A search key containing more than one attribute.

Clustered (primary) index

Records stored in sorted order according to the search key.

Non-clustered (secondary) index

Indices whose search key specifies an order different from the sequential order of the file.

Index entry

A search key value and pointers to one or more records with that value as their search key value. Pointers identify disk block and offset within the block that identify the location of that record on disk.

Dense index

An entry appears for every search key value in the file. If there are multiple entries for the same search key, each index entry contains a pointer to the first data record with that search key. The records are stored sequentially on disk based on the search key, so additional records with the same search key value can be easily found by sequentially reading index entries.

Sparse index

An index entry appears for only some of the search key values. Sparse indices can be used only if the relation is stored in store order of the search key; that is if the index is a clustered index.

Multilevel index

Indices with two or more levels.

B+ tree index

The most widely used index structure. Takes the form of a balanced tree (hence the B in B+ tree) in which every path from root to leaf is the same length. Each nonleaf node in the tree (other than the root) has between m/2 and m children where m is a fixed values for a particular tree. The nonleaf nodes form a sparse multilevel index on the leaf nodes. Each leaf node maintains a pointer to the next leaf node, ordered from smallest to largest search key value. The size of m is normally chosen based on how many index entries fit on a disk block (disk block sizes are normally 4Kb, 8Kb, or 16Kb depending on the system). B+ trees are a more generalized version of 2-3-4 trees from CS10.