CS 10: Spring 2014

Lecture 22, May 19

Code discussed in lecture

Red-black trees

Red-black trees are a variation of binary search trees. In fact, we'll create a class RBTree as a subclass of BST, but I had to make several changes to the version of the BST.java file we saw previously. In particular, I made many of the methods that had been in the BST class taking a Node as a parameter be in the Node class instead. Why? Because that way, I get dynamic binding when I make a Node subclass within RBTree.

Red-black trees are balanced binary trees: the height of an n-node red-black tree is always O(lg n). The binary-search-tree operations on a red-black tree take O(lg n) time in the worst case.

Definition of a red-black tree

A red-black tree is a binary search tree with one extra bit per node: a color, which is either red or black. As in our binary search tree, absent nodes are represented by the sentinel.

In a red-black tree, we think of the leaves as the sentinel, and the sentinel is always black. All instance variables of binary search trees and their nodes are inherited by red-black trees (key, value, left, right, parent, sentinel, and root ). We don't care about the key in the sentinel, but we do care about its color and structural instance variables (left, right, and parent).

A red-black tree obeys the five red-black properties:

  1. Every node is either red or black.
  2. The root is black.
  3. Every leaf (the sentinel) is black.
  4. If a node is red, then both its children are black. (Hence there can be no two red nodes in a row on a simple path from the root to a leaf.)
  5. For each node, all paths from the node to descendant leaves contain the same number of black nodes.

The height of a node is the number of edges in a longest path to a leaf. The black-height of node x, which we write as bh(x), is the number of black nodes (including the sentinel) on the path from x to a leaf, not counting x. By property 5, black-height is well defined. Here is a red-black tree with keys inside nodes and with node heights h and black-heights bh labeled:

By property 2, any node with height h has black-height at least h/2. (At most half the nodes on a path to a leaf are red, and so at least half are black.)

We can also show that the subtree rooted at any node x contains at least 2bh(x) − 1 internal nodes. The proof is by induction on the height of x. The basis is when h(x) = 0, which means that x is a leaf, and so bh(x) = 0. The subtree rooted at x has 0 internal nodes, and 20 − 1 = 0. Any child of x has height h − 1 and black-height either b (if the child is red) or b − 1 (if the child is black). By the inductive hypothesis, each child has at least 2bh(x) − 1 − 1 internal nodes. Thus, the subtree rooted at x contains at least 2 ⋅ (2bh(x) − 1 − 1) + 1 = 2bh(x) − 1 internal nodes. (The  + 1 is for x itself.)

These two facts lead to the following theorem:

A red-black tree with n internal nodes has height at most 2 lg (n + 1).

To prove the theorem, let h and b be the height and black-height of the root, respectively. By the above two facts, n ≥ 2b − 1 ≥ 2h/2 − 1. Adding 1 to both sides and then taking logs gives lg (n + 1) ≥ h/2, which implies that h ≤ 2 lg (n + 1).

Operations on red-black trees

The non-modifying operations on binary search trees—minimum, maximum, predecessor, successor, and search—are unchanged for red-black trees.

Inserting and remove are not so easy.

If we insert, what color to make the new node?

If we remove a node, what color was the node that was removed?

Rotations

You might recall the rotation operation from the midterm exam. It's the basic tree-restructuring operation. We need rotations to maintain red-black trees as balanced binary search trees. A rotation changes only structural instance variables and maintains the binary-search-tree property.

We have both left rotation and right rotation operations. They are inverses of each other. A rotation is called on a node within a binary search tree.

Here is what rotations do:

Look at the method leftRotate in RBTree.java. It assumes that this.right is not the sentinel and that the root's parent is the sentinel. The code for rightRotate is symmetric to leftRotate.

Here's an example of a call to leftRotate:

Notice that before rotation, the keys in x's left subtree are less than x's key of 11, and the keys in x's right subtree are greater than x's key. The left rotation makes y's left subtree into x's right subtree. After rotation the keys in x's left subtree are still less than x's key, which is less than the keys in x's right subtree, which is less than y's key of 18, which is less than the keys in y's right subtree.

Each rotation operation takes O(1) time, since only a constant number of instance variables are modified.

Inserting into a red-black tree

To insert into a red-black tree, we start by calling the insert method from the superclass BST. We then make a new node have the sentinel as its children, and we color the new node red. (The getNewNode method is new here and in BST.java. It's how we ensure that the new node created is from the correct Node class.)

Then, the insert method calls rbInsertFixup because we might have violated a red-black property. Which properties might be violated?

  1. OK
  2. If z is the root, then there's a violation. Otherwise, OK.
  3. OK.
  4. If z.parent is red, then there's a violation: both z and z.parent are red.
  5. OK.

The rbInsertFixup method maintains the following loop invariant:

At the start of each iteration of the while-loop:

  1. z is red.
  2. There is at most one red-black violation:

We've already seen that the loop invariant holds initially.

When the loop terminates, it's because z.parent is black. So property 4 is OK. Only property 2 might be violated, and the last line fixes it.

Showing that the loop invariant is maintained is a bit tricky. There are six cases, three of which are symmetric to the other three. The cases are not mutually exclusive. Let's consider just the cases in which z.parent is a left child. Let y be z's uncle (that is, y is z.parent's sibling).

Analysis

It takes O(lg n) time to get throught the insert call up to the call of rbInsertFixup. Within rbInsertFixup:

Thus, insertion into a red-black tree takes O(lg n) time.

Removing a node from a red-black tree

The remove method in RBTree is based on the remove method from the BST class. It calls an overridden version of transplant, which always assigns to v.parent, even if v is the sentinel. By changing the transplant method from a method of BST or RBTree to a method of the appropriate Node inner class, I get dynamic binding: when transplant is called on an RBTree.Node, the appropriate version of transplant runs.

The remove method in RBTree has the following differences from the remove method in BST:

If y was originally black, what violations of red-black properties could arise?

  1. No violation.

  2. If y is the root and x is red, then the root has become red.
  3. No violation.
  4. Violation if x.parent and x are both red.
  5. Any simple path containing y now has one fewer black node.

We remove the violations by calling rbRemoveFixup. The idea is to move the extra black up the tree until

Within the while-loop of rbRemoveFixup:

There are eight cases, four of which are symmetric to the other four. As with rbInsertFixup, the cases are not mutually exclusive. We'll look at the cases in which x is a left child.

Analysis

It takes O(lg n) time to get through remove up to the call of rbRemoveFixup.

Within rbRemoveFixup:

Testing

Use the code in RBTreeTest.java to test the RBTree class.