Sometimes when you solve a problem, it’s helpful to take some notes as you go: book-keeping. For example, if you had to find a path through a picture of a maze, you might use your pencil to draw some paths as you worked out the problem. If you were solving a Sudoku puzzle, you would mark down some possible numbers in each cell.
A data structure allows information to be organized and stored. Sometimes data structures are used to organize important data (for example, a collection of names and associated phone numbers), and sometimes data structures are used for book-keeping as an algorithm runs.
We’ll look at stacks, queues, and linked lists.
In general, data structures:
Does the Javascript array provide a place to store and retrieve items? Yes; for example, we might have a array of numbers, strings, or addresses of objects. What relationship can the array express among items? The items are in some order implied by the indices of items in the array. For example, we say that the fourth item follows the third item in the array, and it precedes the fifth item. We have seen algorithms that reorder the array (changing the relationship between items), find items in the array, or just apply operations to each item in the array.
Linked lists are a type of data structure that you have not seen yet in this course. Unlike Javascript arrays, linked lists are not built in to Javascript, and we will have to implement them ourselves.
Linked lists provide many of the same features that Javascript arrays do:
For many purposes, Javascript arrays are a good choice for containing several items in sequential order. Because Javascript arrays are built into the language, they have been optimized for speed, and the time costs for accessing, deleting, changing, or adding an item have small constant factors associated with them. Since we will write our own linked lists, Javascript has to interpret the code we write, which will take some time. Nevertheless, for sufficiently large numbers of items, linked lists may provide some advantages for some operations.
Let’s look at how Javascript arrays actually work. The items in a Javascript array are stored in order in memory. Consider an array of 4-byte integers. The 0th item in the list is stored at some address, let’s say 7080. Then the first item would be stored at 7084. The second item would be at 7088, and so on. The ith item would be at address 7088 + 4i.
Let’s say you have a Javascript array with 20 items. You would like to maintain the order of the items, and you would like to insert a new item as the first item of the list.
The new item will be at index 0. Notice that the item previously at index 0 has to move to index 1, the item previously at index 1 has to move to index 2, and so forth. With some cleverness, you might realize that the best way to do this operation would be to start at the end of the list and move each item one position to the right (does this idea remind you of insertion sort?), but no amount of cleverness will save you from having to loop over each item in the list and moving it.
Inserting into the beginning of the list is the worst case, since all other items have to be moved to the right; inserting at the end of the list is much cheaper. So the worst-case running time for insertion into an n-item Javascript list is Θ (n).
If you would like to maintain the order of the list and also delete an item, then the worst-case cost for an n-item list is Θ (n), since all items following the deleted item have to be moved up by one position in the list. You can delete items from a list:
del my_list[3]
Deleting from the end is a special case that is not expensive, since no items in the list have to be moved; deleting from the end has time cost $\O(1)$. We call deleting from the end popping, and there is a special pop
method on arrays that removes from the end, returning the item that was removed.
If the array is unsorted, then it takes O(n) (linear in the number of elements in the array) time to find an item in an n-item array, by using linear search. Similarly, finding the smallest or largest item in an unsorted array takes linear time.
Linked lists, like Javascript arrays, contain items in some order, but the implementation of linked lists will be very different from that of Javascript arrays. There are two reasons we will implement our own linked lists:
Certain operations for linked lists are efficient in terms of time. For example, inserting an item into the list costs Θ (1) time, no matter where in the list the item is inserted—much better than the Θ (n) time for inserting into an n-item Javascript list. For large enough linked lists, in a situation where insertion anywhere in the list is frequent, a linked list might be the better choice, despite the smaller constant factor for Javascript lists.
The approach we will use to create the linked list is an example that we will build on to form more complex data structures, trees and graphs, that express more interesting relationships between data than the simple sequential list structure.
Conceptually, we will create a linked list by making lots of objects called nodes. Each Node
object will contain an item: the data in the list. The Node
objects themselves could be anywhere in memory, and will typically not be stored sequentially, or in any other particular order, in memory.
How, then, can we tell which item in the list is first, which is second, and what the order of items is? Each Node
object will contain an instance variable that has the address of the next Node
object in the linked list.
Let’s look at an example of how this might work. (I should emphasize that this is not yet a good implementation of a linked list. I just want to show you how nodes work.)
Here’s some simple code:
When we printed the list, we created a new, temporary variable node
that initially has the address of the first Node
object in the linked list.
Objective: write a method that implements an algorithm on a simple linked list.
Sometimes you have a piece of data, perhaps a string, and you would like to know if that data is contained in any node in the list. Write the function listFind
that searches the linked list for the data given by the parameter needle
. If found, the function should return the containing node. Otherwise, the function should return null
.
Here is a solution. Do not look at it until you have completed the exercise yourself.
A stack data structure models a familiar way of organizing objects in the physical world: stacked objects. When you add an object to the stack, it’s added to the top of the stack. When you take an object off of the stack, it’s taken off the top of the stack. In order to access the bottom plate of a stack of heavy plates, you might first have to take off all of the plates above it one by one, starting from the top. We say that the stack is “last in, first out” (LIFO) for this reason.
Here is a visualization of a stack of numbered bricks, written by Dartmouth student Daniel Shanker. Click on a brick to push it onto the stack in the correct order.
So, we need operations to push data items onto the top of the stack, and pop items off the top.
Sometimes, stacks and queues are referred to as abstract data types. In this case, abstract means that (just like in pseudocode) we don’t really care about some details of implementation. If a data structure allows items to be pushed and popped, following a “last in, first out” order, we say that it is a stack. If it quacks like a duck, it is a duck!
There are many ways to implement stacks and queues, and good reasons to choose particular implementations, such as the computational costs of push and pop operations, but we will focus on how stacks and queues can be used in algorithms. Fortunately, Javascript arrays can be used as fine stacks, using the built-inpush
(to push data onto the back of the array) and pop
methods.Notice that a Javascript array allows operations other than push
or pop
. However, if you are intending to use a stack, you should not use those operations, and in many other languages, the stack construct only allows those operations.
In this exercise, you will implement an interpreter, like the Javascript interpreter – a very tiny interpreter, of course! An interpreter takes a human-readable string, parses it (breaks it down into smaller parts, called tokens), and then computes something based on those tokens.
One thing an interpreter might need is a way to evaluate mathematical expressions. In this section, we’ll see how to use stacks to do this.
Take the very simple expression “3 + 4”, stored as a string. We could perhaps write a program that looked at characters from the string from left to right. First, we see the number 3. Then we see the operator +, and we realize that the operator + indicates to add the number before it, and the number after it. Finally, we get the 4, and do the addition.
The order of operations makes things trickier. Consider the string “3 + 4 * 2”. We get a 3. Then a +. Then a 4. Do we add 3 and 4? No, not yet. Then we get a * operator. Then a 2. Now multiply 4 and 2, and finally add 3 and 8. You can imagine that parentheses might make things even worse. The difficulty with computing expressions like “3 + 4 * 2” is that while we might like to process the string from left to right, the order in which computations are performed could be complicated.
Reverse Polish Notation is a way of writing mathematical expressions that easily described the order in which to perform computations, without the need for any order of operations or parentheses. Here is how you might write the above expression: “3 4 2 * +”.
Let’s compute the value of “3 4 2 * +”, reading characters from the string left to right. 3 is a number. Store it in your head. 4 is a number; store it. Then store 2. Then we see an operator, *. The * operator needs two numbers to operate on, so take the two most recent numbers that you stored, 2 and 4, multiply them, and replace them in your head with the result, 8. Then we see a the operator +. Take the two most recent numbers in your head, 8 and 3, add them, and replace those numbers with the result, 11.
An algorithm to compute Reverse Polish expressions can use a stack to provide the “last in, first out” mechanic that allows us to access the most recent numbers when we see an operator. Below is the approach for evaluating an expression in RPN. In the psuedocode, a token is a number or an operator in the string; tokens are separated by spaces.
Create an empty stack
For each token (either a number or a operator) in our RPN expression:
If the token is a number
Push the number onto the stack
If the token is a operator
Pop two numbers off of the stack
Operate and push the result onto the stack
In a correctly written RPN expression, there will be one item left in the stack — pop it off and that is the answer
Here is a demonstration of the algorithm at work. (Visualization by Daniel Shanker ’16.)
Write an interpreter for reverse polish notation. You interpreter should consist of a function, rpn()
, which should take as input a string consisting of integers and operators separated by spaces, and output the computed value of the reverse polish notation. You’ll probably want to write other helper functions as well. You may use integer division. The .split()
method of the string class may be helpful to you.
Also include some test code that calls the rpn()
function for several legal expressions (strings). Here are two starting tests for you. (You should add at least five more, including some simple and some more complex.
Notice something slightly tricky here. 54 − means 5 − 4, not 4 − 5. So make sure you get the order correct as you pop values from the stack and apply operators to them.
Here is a solution.
When you get in line (or enqueue) for an amusement park ride, you have your spot in line. Nobody can cut in front of you, and you are guaranteed to get out from the front of the line (dequeue) before anybody behind you. Like stacks, queues allow for the ordering of objects while following a certain set of rules. Whereas stacks are “last in, first out,” queues are the opposite - “first in, first out.”
We will need operations to enqueue data items at the end of the queue, and dequeue items from the front. Here is a visualization of a queue (written by Dartmouth student Daniel Shanker), with enqueue and dequeue operations. Try to mimic the following sequence (each character’s name is denoted by the letter next to them, and you can enqueue them by clicking on them): Dave and Eliza get in line to buy movie tickets, followed shortly by Alice. Dave is served. Before Eliza is served, Carol and Betty get in line. At that point there are only two tickets left, so only Alice and Carol are served.