admin

Topics

Today’s lecture is from Silberschatz and Garcia-Molina, as our text doesn’t cover these topics in sufficient detail.

Indexing concepts
- Ordered indexes
- Multilevel indexes
- B+ tree indexes
Hashing
- static hashing
- dynamic hashing
Comparison of ordered indexing vs. hashing
Indexes in SQL

Indexing concepts

Some files/DB’s fit into main memory, therefore access is really fast. This is not always an option, for various reasons, so we end up using disks.

The time required to access a sector on the disk is generally:

\text{avg. seek time} + \text{avg. rotational delay} + \text{data xfer rate} + \text{controller overhead}

The fastest commercially available spinning hard drive as of January 2025 is Seagate’s $500 Iron Wolf:

up to 24TB
encases up to 12- 2TB platters that operate in a helium atmosphere
7200 RPM spin
512MB cache
6 GB/s top interface speed
512 byte sectors
280MB/s read/write, or about 1.7 microseconds to read one sector of 512 B
MTBF 2.5 million hours (2.85 centuries!)

For example, a sequential, non-indexed search of file with 100,000 512B records, with the “key” as the first “word” on each line. Assuming a read of any record takes $1{\mu}s$ , then:

$1{\mu}s$ to read any single record (ignoring typical seek times, controller overhead, and typical rotational latency, etc.)
$100ms$ to read the whole file,
$50ms$ on average to find the record with a specific key.
Adding a record is easy: just put it on the end.

If we assume all the records are sorted by a key, search is a little better:

arbitrary searches using a key can be reduced by binary search to
$(log_2 (100000)) \times 1{\mu}s \approx 17{\mu}s$
however the insertion of a record is messy, as it requires moving all of the records AFTER the slot where the new one goes to make room (unless it goes at the very end).

By contrast, SDD’s have a seek time of about $0.01ms$ and no rotational latency. My Macbook Air 13” with an M2 processor and SSD drive gets about 3GB/s raw read/write. For SSD’s there is no seek time or rotational delay.

With indexes (indices) we can do a lot better. The index is also a file and is made up of index entries:

key	pointer
x	yyy

where key is how we look up a record and pointer is some indication of which disk block/record holds the data we seek.

Index files are typically a lot shorter than the real data file.

To be of any value, the index has to be faster to use than non-indexed search, no larger than the indexed file, and not too expensive in time and computation to do inserts and deletes.

Two kinds of indices:

ordered where the key/pointer pairs are stored in sorted order
hash where keys are uniformly distributed across the domain of a function
- do you know about hash functions?

Ordered indices

An indexed sequential file is ordered by the key. Key may or may not be the table’s primary key It may have one or more secondary indices based on some other field in the record.

Dense index files

Typical idea - one key/pointer pair for every entry in the table SLIDE 13-1 This scheme has only to maintain the key/pointer pairs in sorted order… the table may be in any order.

Inserts require inserting into the index the new key/pointer. Inserts of the new record can be anywhere.

Secondary indices

SLIDE 13-2 A table may commonly need to be accessed by some other key. To address this you either have multiple dense indices, one primary and one secondary, OR you have some other index for the primary and the secondary is a dense index.

Sparse index files

SLIDE 13-3 Index file doesn’t contain all of the key/pointer values. Instead, each key/pointer corresponds to the lowest key in a block, with the block size typically determined by the storage hardware or the operating system.

Table is assumed to be sorted on key data. Index is used to find record with key ≤ key of the desired record, then you search forward using the record-to-record links (if not contiguous).

They take less space, and inserts/deletes are easier.

However, accesses can be slower than dense indices.

Multilevel indices

If an index grows too large to maintain in real memory, it ends up being paged out to disk … slowing the system.

SLIDE 13-4 DRAW it for discussion

One solution is to treat the primary index as a sequential file itself and maintain a new, smaller index for this index file as the in-memory index.

While this multilevel index scheme increases indexing time by a read from storage for every level of index, it is a popular choice over alternatives such as partitioning a table or adding more keys.

Good news and bad news about indices

Good: big improvements in searching for records
Bad: updating indices can be costly

Index entry deletion

Dense index entry deletion

Once the target record is found using its index, the record is deleted. If it was the only record with that key, the key/pointer pair is deleted from the index.

Sparse index entry deletion

Once the target record is found using its index, the record is deleted. If the key exists in the index, it is replaced by the next higher search-key value in the file (in search key order) unless that search-key is already represented in the index… in that case, the key entry for the deleted record is itself deleted.

Index entry insertion

For dense indices simply insert the key/pointer pair. For sparse indices we do the same, unless a new block is required. In that case, the first key/pointer appearing in the new block is inserted into the index.

Indexed Sequential Access Method (ISAM)

A good explanation of ISAM is here:

ISAM is a static index structure – effective when the file is not frequently updated. Not suitable for files that grow and shrink. When an ISAM file is created, index nodes are fixed, and their pointers do not change during inserts and deletes that occur later (only content of leaf nodes change afterwards). As a consequence of this, if inserts to some leaf node exceed the node’s capacity, new records are stored in overflow chains. If there are many more inserts than deletions from a table, these overflow chains can gradually become very large, and this affects the time required for retrieval of a record.

Thus, indexed–sequential file access performance degrades as the index space grows, requiring manual intervention to reorganize the entire index.

B+ Trees

These trees provide very efficient retrieval of data mostly because of their extremely high fan-out … 100’s of pointers or more in a disk block. This helps because reading a disk block is very fast. The high number of pointers in a block reduces the I/O times required to access the data.

For example, if you run diskutil info / | grep -i "Block Size" on your Mac you will see something like this:

$ diskutil info / | grep "Block Size"
   Device Block Size:        4096 Bytes
   Allocation Block Size:    4096 Bytes
$

If a disk has a 4096 byte block and the key-pointer pairs require 12 bytes, then reading one disk block fetches about 340 key-pointer pairs.

Many filesystems use B+ trees: NTFS, JFS, among others, and relational databasessuch as IBM DB2, Microsoft SQL server, Oracle 8, MariaDB, MySQL, and SQLite allsupport B+ trees for indexes.

B+ trees automatically reorganize during insertions and deletions. Thus, reorganization of the entire file is not necessary. All paths from root to leaf are of the same length.

The goal is to always have optimal indices for the data in the tree. For example, every block is kept between half full and completely full.

B+ trees do require a little more overhead as a result. However, this disadvantage is far outweighed by the advantages they bring.

There is a parameter $n$ associated with B+ Trees, and this parameter determines the layout of all blocks of B+ tree.

Each block will have space for $n-1$ search-key values and $n$ pointers.

A B+ tree block is similar to the index blocks we saw earlier except that the B+ tree block has an extra pointer, along with the $n$ key-pointer pairs.

We choose $n$ to be as large as will allow $n$ pointers and $n-1$ keys to fit in one block. We do this to minimize disk I/O.

For example (From Garcia-Molina):

Suppose blocks of 4096 bytes. Also let keys be integers of 4 bytes and let pointers be 8 bytes (64 bits). If there is no header information kept on the blocks, then we want to find the largest integer value of $n$ such that $4(n-1)+8n\le 4096$ That value is $n=340$ .

This provides up to $32767$ keys, and pointer values up to $2^{64}$!

SLIDE 13-5 is a typical leaf node $n=4$

The keys in leaf nodes are copies of keys from the data file. These keys are distributed among the leaves in sorted order, from left to right.

All used pointers in your keys appear at the beginning of the block, with the exception of the last pointer in a leaf, which points to the next leaf.

SLIDE 13-6 is a typical interior node $n=4$

SLIDES 13-7 .. 13-13

Hash Indices

Static Hashing

The use of an index seems obvious. However, accessing an index can require reading several blocks, or doing a binary search across a dense collection of records maintained in sorted order.

Hashing is an alternative that doesn’t require an index, and can find records reading fewer blocks. First we need some background.

A bucket is a unit of storage that can hold one or more records.

A hash function is one that takes an input of some length and produces a positive integer value within a given range. They are typically easy to calculate quickly, and always produce the same result given the same input. Their distribution is both

uniform, the hash functions assigns each bucket the same number of key values $K_i$ over all the keys $K$ , and
random, on average, each bucket will have about the same number of values assigned to it, regardless of the actual distribution of the keys.

For our use, if $K$ is the set of all possible key values in a DB table, and $B$ is the set of all bucket addresses, then you can think of the hash function as producing a mapping from $K$ to $B$ .

Note: The hash functions we’ll talk about in CS61 have fewer requirements than the ones in CS55, and are more like the ones used in CS50. For example, the CS61 hash functions may have collisions, just not too many.

To insert a record with key $K_i$ , we compute $hash(K_i)$ which returns the address of the bucket for that record.

Lookup proceeds similarly.

DRAW on the board 8 buckets (0..7) with four slots each, then feed each record

ID	Name	Dept	Salary	BUCKET
10101	Srinivasan	Comp.Sci.	65000	6
12121	Wu	Finance	90000	4
15151	Mozart	Music	40000	1
22222	Einstein	Physics	95000	3
32343	El Said	History	80000	2
33456	Gold	Physics	87000	3
45565	Katz	Comp.Sci.	75000	6
58583	Califeri	History	60000	2
76543	Singh	Finance	80000	4
76766	Crick	Biology	72000	5
83821	Brandt	Comp.Sci.	92000	6
98345	Kim	Elec. Engr.	80000	3

Bucket Overflows

Can happen for a couple of reasons

Insufficient buckets - goal is $n_B > \frac{n_r}{n_f}$ the number of buckets should be greater than the number of possible records divided by the number of slots in each bucket.
Skew can occur too
a. some records may have the same key
b. hash function may be nonuniform over the $K_i$

When this happens, a new bucket is allocated and linked to the full one. When a lookup gets to this bucket list, the buckets in the list are searched for the record.

DRAW on the board the addition of two more instructors with new names that hash to bucket 3, causing an overflow chain.

As the database grows over time, we have three options:

Choose hash function based on current file size.
- Performance degrades as file grows.
Choose hash function based on anticipated file size.
- Space is wasted initially.
Periodically manually re-organize hash structure as file grows.
- Requires selecting new hash function, recomputing all addresses and generating new bucket assignments.
- Costly, and shuts down database.

Dynamic or Extensible Hashing

As the database grows, this scheme splits and combines buckets as necessary.

Rather than a hash function that maps into a range of $0..7$ , this scheme requires a hash function that maps to a much larger range of $b$ -bit numbers. Typically, $b=32$ .

We do not pre-allocate a bucket for all $2^{32}$ possible hash values, we allocate them as needed.

Typically, buckets are the size of a disk block, perhaps 4096 bytes. Thus, if each record requires 128 bytes, then we would get 32 entries in each bucket (block).

SLIDE 13-15

Explain general insertion algorithm and walk through it:

Take the first $i$ high-order bits of the $hash(K_i)$ , looks up the entry for this bit-string in the bucket pointer table, and follows the pointer.
If there is room in that bucket, insert it and you’re done.
If there is no room in that bucket, we split the bucket and redistribute the current records as well as the new one.

SLIDE 13-16 thru 13-21

Bitmap indices

SLIDE 13-19
Good for easy/fast querying on multiple keys, especially low cardinality keys…. sometimes used at the leaf nodes of $B^{+}$ trees.

Explain example.