Binary Search

In linearSearch4.cpp we saw how to find x using a sentinel placed on the end of a sorted array. And each time through the loop we eliminated just 1 possible index. But you know how to eliminate half of the remaining indices each time.

See Demo binarySearch.cpp

Analysis

Let si = interval size in the ith recursive call. Therefore, when we start out, s1 = n.

To compute si+1 from si:

So, now we have:

	s1 = n
	s2 <= s1/2 = n/2
	s3 <= s2/2 = n/4
	s4 <= s3/2 = n/8
	.
	.
	si <= n/2i-1

Now how many recursive calls k until sk < 1, i.e., sk = 0? We want n / 2k-1 < 1. If we take the log of both sides we get log2 n < log2 2k-1 = k - 1. Therefore k > log2 n + 1.

What integer value does k have? First, suppose that log2 n is an integer. Because k must be strictly greater than log2 n + 1, we choose k = log2 n + 2. Now suppose that log2 n is not an integer. Let q be the least integer greater than log2 n. Then we choose k = q + 1.

Each recursive call makes <= 3 comparisons. Therefore, binary search makes <= 3 log2n + 6 comparisons, even if x is not found.

Recall that log2n grows much slower than n:

	n		     log2n
	-		----------------
	1		       0
	1,000              approx 10
	1,000,000          approx 20
	1,000,000,000      approx 30
So it's better to have a log2 n term than an n term.

More abstract analysis

We have been counting the number of comparisons. Is that enough? Is that too much? A lot goes on besides comparisons. How do we count it all?

We don't count it exactly. We just understand how the running time grows with the input size.

Consider linearSearch0. The running time is c1n + c2 for some constants c1, c2 > 0. This is because each loop iteration takes c1 time, and there's a time cost of c2 at the beginning and end.

How about linearSearch1? The time is d1n + d2 for some constants d1, d2 > 0 in the worst case.

We use the same sort of analysis for all the linear search functions in the worst case. In the worst case, each has to check all n input values, so the time is some linear function of n.

And what about the time for binary search? We make at most log2n + 2 recursive calls. And it takes constant amount of time per recursive call plus the time for the calls that it makes. There's also some time to set up the first call. Therefore, the running time of binary search is b1log2n + b2 for some constants b1, b2 > 0.

The Rate of Growth of Running Time

I will have much more to say on this in the future, but for now consider the rules for figuring the running time as:

Timing Example

Consider a machine that will run a linear search. It can execute 100,000,000 instructions per second and when translation is done by the compiler it takes 2n instructions to search through n values. Also, consider another machine that will run binary search. It can execute only 1,000,000 instructions per second (it's 100 times slower) and when translation is done by the compiler it takes 25 log2n instructions to search through n values. Lastly, assume that the size of our input array is 10,000,000 elements.

Linear Search Time = (2 * 10,000,000) / 100,000,000 = 0.2 secs
Binary Search Time = (25 log2 10,000,000) / 1,000,000 = 0.0006 secs

Even with a huge handicap, binary search turns out to be 333 times faster. So what is the moral of the story? First, constants are not as important as the order of growth of the time. Second, algorithms are a technology. You can have fast hardware and good compilers, but they are almost worthless without good algorithms.

To Index Previous Next