CS 10: Winter 2016

Lecture 1, January 4

Code discussed in lecture

What's this course about?

"Problem solving via object oriented programming." Examples of problems from your experience? In this class: process images, search social networks, play games, compress files, identify clusters in data sets, solve puzzles like Sudoku, play the Kevin Bacon game on a movie database....

While those are fun (I hope), the goal of the course is to develop expertise in core programming techniques useful throughout computer science, including representation, abstraction, recursion and modularity, and concurrency. One of the most important themes in this course is abstraction:

Proper abstraction allows us to treat complex ideas as "black boxes" so that we can reason about their behavior about without knowing their exact contents. For instance, we can abstract the notion of square root, giving a black box into which we feed a number and from which we receive that number's square root. We don't care what's inside the black box, as long as it preserves the input/output relationship; we could use Newton's method or another algorithm, a look-up table, or some combination.

You use abstraction when you drive an automobile. Do you really know what happens when you step on the accelerator? Lots of parts do their thing: accelerator cable, fuel injector, cylinders, pistons, cams, drive shaft, transmission, universal joint, and all under computer control. But to you, the driver, you rely on the abstraction "step on accelerator, car moves."

The same analogy applies to standard interchangeable parts. This was a revolution in military and industrial technology, which started at a gun factory in Windsor, VT, about half an hour south of here. (The building is now the home of the Precision Tools museum, which is worth seeing.) Similarly with computers — USB ports, video output, headphone mini-jacks, etc. Lots of different pieces, but with standard connections and cables, so you can hook them up as you please.

In order to define our abstractions, we need to have some kind of notation for describing them. Ideally, we want a notation that can be read by the computer as well as by humans. In this class we will use the object-oriented (OO) language Java. Simula, a simulation language, was the first OO language. The idea is to express a simulation (or other problem) in terms of interacting objects. Sometimes those objects model things in the world, like the wildebeasts that stampeded in The Lion King. It would be too complex to try to control them all directly, so they created an object for each wildebeast that contained rules for how it should interact with other wildebeasts and with the environment. They started them up and let them interact. They thus simulated wildebeast behavior and interaction and put it onto film.

Often objects are more abstact. But they usually have:

Objects are a good way to perform abstraction. If the data is private (which is usually the case in Java), the only way to interact with it is through the methods. The methods become an interface between the data and the world. The exact way that the data is stored becomes invisible to the outside world, so it can be changed without breaking anything.

This ability is important, because as we will see there are often a number of different data structures that can be used to represent the data, each with its advantages and drawbacks. In this class we will spend a lot of time looking at:

The idea of "data abstraction" came from a 1972 paper by Parnass entitled, "On the Criteria To Be Used in Decomposing Systems Into Modules" He compared two approaches for decomposing a program to create a KWIC (Key Words in Context) Index into modules. The first was the standard way at the time, which was to break the program down into sequential steps (e.g. Input, Shift, Alphabetize, Output). The other broke it by giving a module to each type of data (e.g. Line Storage). He showed that changing how a type of data was represented required changing every module in the first decomposition, but only required changing a single module in the second.

Administrative stuff

See http://www.cs.dartmouth.edu/~cs10/ for

We have an x-hour Thursday!

I need you to fill out and submit two Word document forms to me:

To make my life a little easier, please follow the instructions at the top of each form when submitting them.

You can get my planned schedule for the course by clicking the Schedule link on any of the course web pages. As I put up the lecture notes for each lecture, a link to the notes will go live.

I'll post short assignments and lab assignments under the corresponding links. Note that Short Assignment 0 is already posted and due on Wednesday.

Lecture notes are there to help you, to keep you from having to write down and type in a lot of stuff that we go over in class. However, they aren't necessarily complete, and they're also simple, sometimes terse, not necessarily grammatical, etc. They're my working notes, and are not intended to take the place of the lectures or the readings. Lecture notes are not always complete. I always reserve the right to cover material in lecture that was not in the online lecture notes.

Other instructors in CS 10 (particularly Tom Cormen) have modified and expanded my notes, and I will be using some of their writing. Additions, corrections, and suggestions are welcome, and will improve the notes for this and future classes.

Java basics

You all know a programming language. For most of you it is Python. For some of you it is Java or C. Learning a second programming language is much easier than learning a second natural language like English. The first hundred pages in the book go over what we will use in Java rather formally with a lot of detail. The on-line text introduces Java in a more conversational, interactive way.

Lectures are an awful way to communicate detail. I expect you to read these sections and come to me or course staff with questions. In lecture I will try to give you a framework on which to hang the details, and a way of approaching and thinking about the language.

Dr. Java and Eclipse

We will introduce you to two systems for programming in Java. Dr. Java has an Interaction window, which lets you type Java statements and expressions and see the results immediately. It is sort of like a Python interpreter, where you can type a single line of code and run it. It is great for experimentation. Eclipse is a much more powerful IDE, and supplies all sorts of help like code completion (you type a variable name and a dot and it shows you all of the variables and methods in the object referred to by that variable, with templates describing the parameters of methods) and refactoring (reorganizing your code - e.g renaming a variable, where only the references in the current scope are changed). But if you want to run a command you have to write a program to do it.

I will start using Dr. Java to demonstrate things interactively, but will move to Eclipse for more complicated programs. You may use either for any assignment.

A quick look at a Java class

We will start by skimming a Java class. The class that we will look at is Counter . An object of this class represents a counter that starts at 0. Each time the method tick() is called, one is added to the counter. When the counter reaches a pre-defined limit it "wraps," going back to 0. Such an object could be used as part of a digital timer or digital clock class.

In Java, all code must reside in some class. (That makes Java different from Python or C++. And it's very different from C, which doesn't have classes at all.) By convention, the name of a class starts with an uppercase letter. By rule, the name of the file containing the class is the name of the class, with ".java" appended. Hence, the class Counter is in the file Counter.java.

An object includes instance variables, which represent the object's state, and methods, which operate on the object. You've seen these concepts if you've taken CS 1 or the Computer Science AP course. If you've learned C, then you can think of instance variables like members of a struct and methods like functions, but that operate on the struct. Java doesn't have functions, only methods.

An object of this class represents a counter whose value starts at 0. A Counter object has two instance variables:

Three methods of the Counter class can change the value of the counter:

The way that the tick method works, we could use a Counter object as part of a digital timer or digital clock class.

Let's back up a little. The first thing we see in the Counter.java file is a comment. Java has three forms of comments:

Next we see the line

import java.text.DecimalFormat;

Like an import statement in Python or a #include in C, this line says that there's code elsewhere that we'll be using. In this case, it's a class named DecimalFormat in a module named java.text.

Notice that the import statement ends with a semicolon. In Java, just about all statements end with a semicolon. If you've programmed in C, you're used to this syntactic requirement. If your background is only in Python, then the end of the line (the newline character) serves the purpose that the semicolon serves in Java.

Next, we see the line

public class Counter {

This line says that we're starting to define a class named Counter, and that by being declared as public, this class will be known throughout the entire program. The left curly brace (again, familiar to C programmers but not to Pythonistas) says that we're starting the definition. The closing right curly brace at the end of the file matches this left curly brace and ends the class definition. Curly braces in Java always come in matching pairs (unless they appear in quoted string literals), and they nest.

Next we have our instance variables:

  private int myLimit;                          // upper limit on the counter
private int myValue; // current value

Here, we declare the instance variables myLimit and myValue as ints. In Java, as in C, you have to declare the type of each variable, although in Python you do not. The keyword private says that only methods within the current class (Counter) are allowed to access these instance variables. Each Counter object that is created will have its own copy of these instance variables, so that one Counter object might have a myLimit of 12 and a myValue of 5, while another Counter object has a myLimit of 10 and a myValue of 8.

Then we have two other variables:

  private final static int DEFAULT_LIMIT = 12;  // the default counter limit
private final static String FORMAT = "00"; // minimum number of digits displayed

These look like instance variables, but they're not. Actually, they have a couple of differences from the instance variables myLimit and myValue. First, by being declared final, we're saying that the values we give them when we declare them may never change after their initial assignments. Python doesn't supply the equivalent to final, but C's const declaration is the same as final in Java. Notice the convention that final variable names are written in all uppercase, just as in C and Python. Second, by declaring these variables as static, we're saying that instead of each Counter object having its own copy of these variables, there's just one copy of each, and that copy is shared among all the Counter objects that exist at any time. After all, since DEFAULT_LIMIT is an int with the value 12 and it's never going to change, if we gave each Counter object its own copy of DEFAULT_LIMIT, then we'd waste space. We'll see later in the course that non-final static variables have their own uses. (One is to keep a count of the number of objects of a given class that have been created.) Static variables in Java are also known as class variables, since there's just one for each class.

Notice also that the static variable FORMAT is declared as a String. The String class is built into Java. Like both C and Python, Java allows quoted string literals, such as "00", which is a string comprising two characters, both of which are the character 0 (not to be confused with the integer 0).

Next, we have two special methods that are called constructors.

  /** 
* Constructor for the Counter class, with initial value 0 and limit 12.
*/
public Counter() {
myLimit = DEFAULT_LIMIT;
myValue = 0;
}

/**
* Constructor for the Counter class, with initial value 0 and given limit.
* @param limit - the upper bound for the counter
*/
public Counter(int limit) {
myLimit = limit;
myValue = 0;
}

The name of a constructor is always the name of its class. As we'll see, all methods other than constructors have to declare the type of what they return to their callers. Constructors do not declare any return type. They are the only methods that do not declare a return type.

Python programmers: note that curly braces surround the method bodies. Like C, Java requires curly braces around bodies. Python indicates structure by indenting, but Java does not require proper indentation. In fact, line breaks are irrelevant. The following code would mean exactly the same thing as the first of the two constructors:

  public Counter() { myLimit =
DEFAULT_LIMIT ; myValue
= 0

;}

In CS 10, however, we require proper indentation. As does the rest of the world. And we will insist that your code be properly formatted, so that we humans can read it.

Constructors are run automatically when objects are created. Python programmers have seen constructors, but C programmers have not. The purpose of a constructor is to initialize every instance variable of the object being created. Not just some of the instance variables. All the instance variables. You should make sure that whenever you create an object, a constructor is run on the object and that this constructor gives an initial value to every instance variable.

Why do we care so much about running constructors and initializing every instance variable? Because if you fail to give every instance variable an initial value, the object is in an unknown state. The unknown state is not only unreliable, but it might also be illegal. For example, suppose a Counter had myLimit of 12 but myValue of 15. That would be an illegal state. By running a constructor on a Counter object, we can ensure that its state starts out known and legal. (Exercise: Show how, with the constructors above, we can still make a Counter that starts out in an illegal state.)

Unlike C or Python, Java has overloading: you can have multiple methods with the same name. Notice that we have two constructors, both with the name Counter of the class. They differ, however, in that the first constructor takes no parameters, but the second one takes one parameter, limit, which is declared to be an int. C programmers are familiar with this style of declaring parameters. Python programmers are not, because they don't ever declare variable types. As you can see, you declare the type of a parameter right before the parameter's name in the method header. As long as the number or types of the parameters differ—we call this the method's signature—you may overload a method.

We'll see later how a constructor gets called.

Next is the parameterless method tick:

  /**
* Increment the value of the Counter, wrapping back to 0 when it reaches
* the limit.
*/
public void tick() {
myValue++;
if (myValue == myLimit) // has it hit the limit?
myValue = 0; // wrap if it has
}

The keyword public says that this method can be called from anywhere, and void says that this method does not return a value to its caller. (C programmers, but not Python programmers, are used to seeing void.) The tick method operates on a particular Counter object, using the instance variables of the Counter object on which it is called. (We'll see later how we call a method on an object.) First, the method increments the myValue instance variable using the ++ increment operator of Java (and C, but not Python). Then, if the value of the myValue instance variable has reached the value of the myLimit instance variable, myValue is set to 0. This syntax for an if-statement is just as in C, but it differs from Python: parentheses around the test replace the colon following the test.

Then the methods set and reset:

  /**
* Set the value of the Counter to newValue.
* If newValue is too large or is negative sets it to 0.
* (We will learn better ways to handle errors later.)
*
* @param newValue the value to reset the counter to
*/
public void set(int newValue) {
if (newValue >= 0 && newValue < myLimit)
myValue = newValue;
else
myValue = 0;
}

/**
* Reset the value of the Counter to 0.
*/
public void reset() {
set(0);
}

The set method takes a parameter, newValue, which becomes the new value of the counter, if newValue is legal. Otherwise, the counter's value is set to 0. The reset method just sets the counter's value to 0.

Both C and Python programmers will recognize the legality test newValue >= 0 && newValue < myLimit, which runs the two relational operators >= and < and then combines their results with the short-circuiting logical connective &&. Savvy Python programmers would have written this test as 0 <= newValue < myLimit in Python. Alas, this style of stringing together relational operators is not allowed in Java. The Java compiler would interpret 0 <= newValue as giving a boolean value, which you'd then be attempting to compare with the int value myLimit, thereby generating a compile-time error.

Next is the "getter" method getValue:

  /**
* Return the value of the Counter.
*
* @return the current value of the counter.
*/
public int getValue() {
return myValue;
}

This method returns an int value to its caller, and so the return type in the header is int. The Javadoc documentation also includes this information in the @return part.

The next method in the Counter class is toString:

  /**
* Return a String representation with at least 2 digits, padding with a
* leading 0 if necessary.
*
* @return a String representation of the counter with at least 2 digits.
*/
public String toString() {
DecimalFormat fmt = new DecimalFormat(FORMAT); // use at least 2 digits
return fmt.format(myValue);

The toString method is called automatically whenever we need the string representation of an object. Notice that it returns a String. If we don't supply the toString method in a class definition and we try to convert the object to a string, we just get the object's address in memory, in hexadecimal (base 16), which is rarely useful. Here, the toString method converts the counter's value to a character string of at least two digits. It uses the class DecimalFormat that we imported up at the top of the file. The toString method first creates a DecimalFormat object with the Java keyword new. The only way to create a new object in Java is via the keyword new. The class variable FORMAT is passed to the constructor for DecimalFormat.

The new operator does three things:

  1. It allocates memory for the object.
  2. It runs a constructor on the object being created.
  3. It returns the address of the object in memory. We call this address a reference to the object.

Let's take a look at the variable fmt. It is a local variable, which means it exists only during the call to the toString method. The variable is created when toString executes, and it ceases to exist when toString completes.

Now that you know that the new operator returns a reference to the object created, you know what is assigned to fmt: a reference to the DecimalFormat object that new created and ran a constructor on. Not surprisingly, then, fmt is declared as a DecimalFormat object. Except that it's not. It's actually declared as a reference to a DecimalFormat object. If you've seen aliasing in C or Python, where two different variables hold the address of the same object, you can appreciate the difference between an object and a reference to an object.

Having created a DecimalFormat object and assigning its address to the local variable fmt, the toString method then calls the format method on this DecimalFormat object, with myValue as a parameter (the only parameter) to the format method. We don't know how the format method works, but that's abstraction acting in our favor. Of course, we haven't even said what the format method does, but we can worry about that later.

What's important here is how we call a method on an object. We called fmt.format(myValue). Here, fmt is a reference to a DecimalFormat object, format is a method of the DecimalFormat class, and myValue is the parameter to the call. The general form of a method call is

objectReference.methodName(parameters)

Python programmers have seen this syntax before, but C programmers have not.

I like to think of objects as "things" and methods as "actions." Indeed, most of the time, the name of a class should be a noun or a noun phrase. And the name of a method should include a verb. (With the obvious exception for constructors, which are required to have the same name as the class.)

Now, what does the format method return? You can tell from how it's used. The toString method just takes what format returns and immediately returns it to its caller. And we know that toString returns a String or, more precisely, a reference to a String. Therefore, the format method must return a reference to a String.

The last method in the Counter class has the special name main:

  /** 
* A main program to test the counter.
* (Including such testing programs is a good idea.)
*/
public static void main(String args[]) {
// Create variables that can reference two Counters.
Counter c1, c2;

c1 = new Counter(5); // wraps at 5
c2 = new Counter(); // wraps at 12

final int TIMES = 50;

System.out.println("c1\tc2\tsum");
// Show lots of Counter values.
for (int i = 0; i < TIMES; i++) {
System.out.println(c1 + "\t" + c2 + "\t" + (c1.getValue() + c2.getValue()));

// Tick both Counters.
c1.tick();
c2.tick();
}

c1.reset();
c2.reset();
System.out.println("After reset:\t" + c1 + "\t" + c2);
c1.set(4);
c2.set(10);
System.out.println("After set:\t" + c1 + "\t" + c2);
c1.set(5);
c2.set(-1);
System.out.println("After invalid:\t" + c1 + "\t" + c2);
}

The main method is where program execution starts. In Eclipse, we always have to say which class contains the main method in which we want to start executing the program. So we can have lots of main methods in an Eclipse project, but only one will be the main method.

The header of the main method is boilerplate: public static void main(String args[]). As we'll see, a static method is not called on any object. The main method must be static, because it's the first thing executed, and so no objects can possibly have been created at the time main starts executing. The parameter String args[] allows for command-line arguments. The [] part indicates that args is actually an array of references to String. C programmers have seen arrays. So have Python programmers, but they're called "lists" in Python.

What does the main method do? It starts by declaring local variables c1 and c2 as references to Counter objects. Then it creates two Counter objects. The first one has a limit of 5; the new operator calls the one-parameter constructor. The second object has a limit of 12; the new operator calls the parameterless constructor (which is also called a default constructor). Remembering that new returns a reference to the object it creates, we see that c1 is assigned a reference to the Counter whose limit is 5 and c2 is assigned a reference to the Counter whose limit is 12.

Next, main declares a final local variable TIMES with the value 50. Then, main prints some output to the console by calling the method println on a built-in object referenced by System.out. The println method takes a string and prints it on the console. As in C and Python, we can include escaped characters in a string literal; here, \t denotes the tab character.

Then comes a for-loop. The syntax is exactly the same as in C, but this syntax will be new for Python programmers. Java for-loops are quite a bit more general than in Python, and they do not require lists (or arrays) to be involved. A for-loop has the form

for (initial; test; update)
body

and it is equivalent to a while-loop with the form

initial
while (test) {
body
update
}

Therefore, we could have written the for-loop as the equivalent while-loop

  int i = 0;
while (i < TIMES) {
System.out.println(c1 + "\t" + c2 + "\t" + (c1.getValue() + c2.getValue()));

// Tick both Counters.
c1.tick();
c2.tick();
i++;
}

Although you might find the Python loop header for i in range(TIMES): to be easier, the more general form of Java for-loops has advantages that we'll see later on.

The body of the for-loop calls the tick method on each of the two Counter objects, using the objectReference.methodName(parameters) syntax.

After the for-loop terminates, the main method calls reset on each Counter and then prints the values of the two objects. There's a lot going on in the parameter to println, so let's take a look. The parameter to println must be a String (more precisely a reference to a String). But we're "adding" the string literal "After reset:\t" and the reference c1. To whatever that addition gives back, we're adding the string literal "\t", and then we add the reference c2. What is going on?

Java assumes that if you apply the + operator and one of the operands is a String (again, really a reference to a String…do you see how a string literal's type is really "reference to a String object"?), then the other operand is converted to a String if necessary and concatenated with the String operand. And so we convert the Counter object that c1 references to a String and concatenate that String with the string literal "After reset:\t". Just how is that Counter object converted to a String? By an implicit call to its toString method! In other words, Java will automatically call toString on this Counter object and use the reference to a String that toString returned. We concatenate the two strings, and then we concatenate the string literal "\t" to that result, and then call toString implicitly on the Counter that c2 references, concatenating the string returned by toString what we've built up so far. Then we concatenate another tab character, followed by concatenating an int value that is implicitly converted to a String. That int value is the sum of the values of the two Counter objects, using values returned by calls to getValue on each of the two objects. Notice how the parentheses around that last sum makes it so that last sum is not concatenation, but plain old addition. That ultimate string is what is passed to println and printed on the console. Whew!

A word about the toString method. Remember that it calls the format method of the DecimalFormat class on an object that was initialized with a string of 00. The format method in this situation will take its parameter (the myValue instance variable of a Counter) and return its string representation, but padded with leading zeros if necessary. For example, if myValue is 7, then format returns 07, but if myValue is 107 then format returns 107.

From here on, the main method is easy. It sets the value of the Counter referenced by c1 to 4 and the value of the Counter referenced by c2 to 10, and then it prints out the values again. Then, main sets the value of the Counter referenced by c1 to 5, and it attempts to set the value of the Counter referenced by c2 to -1. But -1 is not a legal value for any counter, and so the value 0 will be used instead. Finally, main prints the new values of the two Counter objects.

Abstraction

I said that abstraction is an important concept. How does it play into this example? Because all the instance variables of a Counter object are private, no code from outside the Counter class can access them. The only way to interact with a Counter is by calling its methods. The caller knows what the methods do, but not how it does them. Granted, the implementations of the tick, set, reset, and getValue methods are pretty obvious, but the toString method could have been implemented in several different ways. The caller cares not. As long as the methods do what they're supposed to do, the caller should be happy.