"Problem solving via object oriented programming." Examples of problems from your experience? In this class: process images, search social networks, play games, compress files, identify clusters in data sets, solve puzzles like Sudoku, play the Kevin Bacon game on a movie database....
While those are fun (I hope), the goal of the course is to develop expertise in core programming techniques useful throughout computer science, including representation, abstraction, recursion and modularity, and concurrency. One of the most important themes in this course is abstraction:
Proper abstraction allows us to treat complex ideas as "black boxes" so that we can reason about their behavior about without knowing their exact contents. For instance, we can abstract the notion of square root, giving a black box into which we feed a number and from which we receive that number's square root. We don't care what's inside the black box, as long as it preserves the input/output relationship; we could use Newton's method or another algorithm, a look-up table, or some combination.
You use abstraction when you drive an automobile. Do you really know what happens when you step on the accelerator? Lots of parts do their thing: accelerator cable, fuel injector, cylinders, pistons, cams, drive shaft, transmission, universal joint, and all under computer control. But to you, the driver, you rely on the abstraction "step on accelerator, car moves."
The same analogy applies to standard interchangeable parts. This was a revolution in military and industrial technology, which started at a gun factory in Windsor, VT, about half an hour south of here. (The building is now the home of the Precision Tools museum, which is worth seeing.) Similarly with computers — USB ports, video output, headphone mini-jacks, etc. Lots of different pieces, but with standard connections and cables, so you can hook them up as you please.
In order to define our abstractions, we need to have some kind of notation for describing them. Ideally, we want a notation that can be read by the computer as well as by humans. In this class we will use the object-oriented (OO) language Java. Simula, a simulation language, was the first OO language. The idea is to express a simulation (or other problem) in terms of interacting objects. Sometimes those objects model things in the world, like the wildebeasts that stampeded in The Lion King. It would be too complex to try to control them all directly, so they created an object for each wildebeast that contained rules for how it should interact with other wildebeasts and with the environment. They started them up and let them interact. They thus simulated wildebeast behavior and interaction and put it onto film.
Often objects are more abstact. But they usually have:
Objects are a good way to perform abstraction. If the data is private (which is usually the case in Java), the only way to interact with it is through the methods. The methods become an interface between the data and the world. The exact way that the data is stored becomes invisible to the outside world, so it can be changed without breaking anything.
This ability is important, because as we will see there are often a number of different data structures that can be used to represent the data, each with its advantages and drawbacks. In this class we will spend a lot of time looking at:
The idea of "data abstraction" came from a 1972 paper by Parnass entitled, "On the Criteria To Be Used in Decomposing Systems Into Modules" He compared two approaches for decomposing a program to create a KWIC (Key Words in Context) Index into modules. The first was the standard way at the time, which was to break the program down into sequential steps (e.g. Input, Shift, Alphabetize, Output). The other broke it by giving a module to each type of data (e.g. Line Storage). He showed that changing how a type of data was represented required changing every module in the first decomposition, but only required changing a single module in the second.
See http://www.cs.dartmouth.edu/~cs10/ for
We have an x-hour Thursday!
I need you to fill out and submit two Word document forms to me:
To make my life a little easier, please follow the instructions at the top of each form when submitting them.
You can get my planned schedule for the course by clicking the Schedule link on any of the course web pages. As I put up the lecture notes for each lecture, a link to the notes will go live.
I'll post short assignments and lab assignments under the corresponding links. Note that Short Assignment 0 is already posted and due on Wednesday.
Lecture notes are there to help you, to keep you from having to write down and type in a lot of stuff that we go over in class. However, they aren't necessarily complete, and they're also simple, sometimes terse, not necessarily grammatical, etc. They're my working notes, and are not intended to take the place of the lectures or the readings. Lecture notes are not always complete. I always reserve the right to cover material in lecture that was not in the online lecture notes.
Other instructors in CS 10 (particularly Tom Cormen) have modified and expanded my notes, and I will be using some of their writing. Additions, corrections, and suggestions are welcome, and will improve the notes for this and future classes.
You all know a programming language. For most of you it is Python. For some of you it is Java or C. Learning a second programming language is much easier than learning a second natural language like English. The first hundred pages in the book go over what we will use in Java rather formally with a lot of detail. The on-line text introduces Java in a more conversational, interactive way.
Lectures are an awful way to communicate detail. I expect you to read these sections and come to me or course staff with questions. In lecture I will try to give you a framework on which to hang the details, and a way of approaching and thinking about the language.
We will introduce you to two systems for programming in Java. Dr. Java has an Interaction window, which lets you type Java statements and expressions and see the results immediately. It is sort of like a Python interpreter, where you can type a single line of code and run it. It is great for experimentation. Eclipse is a much more powerful IDE, and supplies all sorts of help like code completion (you type a variable name and a dot and it shows you all of the variables and methods in the object referred to by that variable, with templates describing the parameters of methods) and refactoring (reorganizing your code - e.g renaming a variable, where only the references in the current scope are changed). But if you want to run a command you have to write a program to do it.
I will start using Dr. Java to demonstrate things interactively, but will move to Eclipse for more complicated programs. You may use either for any assignment.
We will start by skimming a Java class.
The class that we will look at is Counter
. An object of this class
represents a counter that starts at 0. Each time the method
tick()
is called, one is added to the counter. When the counter reaches a
pre-defined limit it "wraps," going back to 0. Such an object could be used as part of
a digital timer or digital clock class.
In Java, all code must reside in some class. (That makes Java different from Python or C++.
And it's very different from C, which doesn't have classes at all.) By convention, the name
of a class starts with an uppercase letter. By rule, the name of the file containing the
class is the name of the class, with ".java" appended. Hence, the class
Counter
is in the file Counter.java.
An object includes instance variables, which represent the object's state, and methods, which operate on the object. You've seen these concepts if you've taken CS 1 or the Computer Science AP course. If you've learned C, then you can think of instance variables like members of a struct and methods like functions, but that operate on the struct. Java doesn't have functions, only methods.
An object of this class represents a counter whose value starts at 0. A Counter
object has two instance variables:
myValue
gives the current value of the counter.myLimit
gives a value that, once the counter's value reaches it, the counter's
value immediately wraps around back to 0. For example, if a Counter
object has
myLimit
of 5, then it counts 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, …Three methods of the Counter
class can change the value of the counter:
tick
increments the counter's value, wrapping around to 0 if the value reaches
the limit.set
takes a parameter that becomes the counter's new value, but only if this
new value is legal, meaning that it's at least 0 and strictly less than myLimit
.reset
sets the counter's value to 0, which is of course always legal.The way that the tick
method works, we could use a Counter
object
as part of a digital timer or digital clock class.
Let's back up a little. The first thing we see in the Counter.java file is a comment. Java has three forms of comments:
//
, then everything to the right, up to the end of the line, is a comment. Python has the same thing, but using #
instead of //
./*
begins a comment that may span multiple lines, until a closing */
appears. C comments have the same form./**
begins a special kind of comment, designed to generate documentation automatically, through a program called Javadoc. Although you might guess that you end such a comment with **/
, just */
will do. The Javadoc for this class appears in Counter.html. Not everything in a Javadoc comment will necessarily appear in the generated documentation, but you should certainly indicate the purpose of the program and who wrote it, even if this information doesn't make it into the Javadoc documentation.Next we see the line
import java.text.DecimalFormat;
Like an import
statement in Python or a #include
in C, this line
says that there's code elsewhere that we'll be using. In this case, it's a class named
DecimalFormat
in a module named java.text
.
Notice that the import
statement ends with a semicolon. In Java, just about
all statements end with a semicolon. If you've programmed in C, you're used to this syntactic
requirement. If your background is only in Python, then the end of the line (the newline
character) serves the purpose that the semicolon serves in Java.
Next, we see the line
public class Counter {
This line says that we're starting to define a class named Counter
, and that
by being declared as public
, this class will be known throughout the entire
program. The left curly brace (again, familiar to C programmers but not to Pythonistas)
says that we're starting the definition. The closing right curly brace at the end of the
file matches this left curly brace and ends the class definition. Curly braces in Java
always come in matching pairs (unless they appear in quoted string literals), and they nest.
Next we have our instance variables:
private int myLimit; // upper limit on the counter
private int myValue; // current value
Here, we declare the instance variables myLimit
and myValue
as
ints. In Java, as in C, you have to declare the type of each variable, although in Python
you do not. The keyword private
says that only methods within the current class
(Counter
) are allowed to access these instance variables. Each Counter
object that is created will have its own copy of these instance variables, so that one
Counter
object might have a myLimit
of 12 and a myValue
of 5, while another Counter
object has a myLimit
of 10 and a
myValue
of 8.
Then we have two other variables:
private final static int DEFAULT_LIMIT = 12; // the default counter limit
private final static String FORMAT = "00"; // minimum number of digits displayed
These look like instance variables, but they're not. Actually, they have a couple of
differences from the instance variables myLimit
and myValue
. First,
by being declared final
, we're saying that the values we give them when we
declare them may never change after their initial assignments. Python doesn't supply the
equivalent to final
, but C's const
declaration is the same as
final
in Java. Notice the convention that final
variable names
are written in all uppercase, just as in C and Python. Second, by declaring these variables
as static
, we're saying that instead of each Counter
object having
its own copy of these variables, there's just one copy of each, and that copy is shared
among all the Counter
objects that exist at any time. After all,
since DEFAULT_LIMIT
is an int with the value 12 and it's never going to change,
if we gave each Counter
object its own copy of DEFAULT_LIMIT
, then
we'd waste space. We'll see later in the course that non-final static variables have their
own uses. (One is to keep a count of the number of objects of a given class that have been created.)
Static variables in Java are also known as class variables, since
there's just one for each class.
Notice also that the static variable FORMAT
is declared as a String
.
The String
class is built into Java. Like both C and Python, Java allows quoted
string literals, such as "00"
, which is a string comprising two
characters, both of which are the character 0
(not to be confused with the
integer 0).
Next, we have two special methods that are called constructors.
/**
* Constructor for the Counter class, with initial value 0 and limit 12.
*/
public Counter() {
myLimit = DEFAULT_LIMIT;
myValue = 0;
}
/**
* Constructor for the Counter class, with initial value 0 and given limit.
* @param limit - the upper bound for the counter
*/
public Counter(int limit) {
myLimit = limit;
myValue = 0;
}
The name of a constructor is always the name of its class. As we'll see, all methods other than constructors have to declare the type of what they return to their callers. Constructors do not declare any return type. They are the only methods that do not declare a return type.
Python programmers: note that curly braces surround the method bodies. Like C, Java requires curly braces around bodies. Python indicates structure by indenting, but Java does not require proper indentation. In fact, line breaks are irrelevant. The following code would mean exactly the same thing as the first of the two constructors:
public Counter() { myLimit =
DEFAULT_LIMIT ; myValue
= 0
;}
In CS 10, however, we require proper indentation. As does the rest of the world. And we will insist that your code be properly formatted, so that we humans can read it.
Constructors are run automatically when objects are created. Python programmers have seen constructors, but C programmers have not. The purpose of a constructor is to initialize every instance variable of the object being created. Not just some of the instance variables. All the instance variables. You should make sure that whenever you create an object, a constructor is run on the object and that this constructor gives an initial value to every instance variable.
Why do we care so much about running constructors and initializing every instance variable?
Because if you fail to give every instance variable an initial value, the object is in an
unknown state. The unknown state is not only unreliable, but it might also be illegal. For
example, suppose a Counter
had myLimit
of 12 but myValue
of 15. That would be an illegal state. By running a constructor on a Counter
object, we can ensure that its state starts out known and legal. (Exercise: Show how, with
the constructors above, we can still make a Counter
that starts out in an
illegal state.)
Unlike C or Python, Java has overloading: you can have multiple methods
with the same name. Notice that we have two constructors, both with the name
Counter
of the class. They differ, however, in that the first constructor
takes no parameters, but the second one takes one parameter, limit
, which is
declared to be an int. C programmers are familiar with this style of declaring parameters.
Python programmers are not, because they don't ever declare variable types. As you can see,
you declare the type of a parameter right before the parameter's name in the method header.
As long as the number or types of the parameters differ—we call this the method's
signature—you may overload a method.
We'll see later how a constructor gets called.
Next is the parameterless method tick
:
/**
* Increment the value of the Counter, wrapping back to 0 when it reaches
* the limit.
*/
public void tick() {
myValue++;
if (myValue == myLimit) // has it hit the limit?
myValue = 0; // wrap if it has
}
The keyword public
says that this method can be called from anywhere, and
void
says that this method does not return a value to its caller. (C programmers,
but not Python programmers, are used to seeing void
.) The tick
method operates on a particular Counter
object, using the instance variables
of the Counter
object on which it is called. (We'll see later how we call a
method on an object.) First, the method increments the myValue
instance
variable using the ++
increment operator of Java (and C, but not Python).
Then, if the value of the myValue
instance variable has reached the value of
the myLimit
instance variable, myValue
is set to 0. This syntax
for an if-statement is just as in C, but it differs from Python: parentheses around the
test replace the colon following the test.
Then the methods set
and reset
:
/**
* Set the value of the Counter to newValue.
* If newValue is too large or is negative sets it to 0.
* (We will learn better ways to handle errors later.)
*
* @param newValue the value to reset the counter to
*/
public void set(int newValue) {
if (newValue >= 0 && newValue < myLimit)
myValue = newValue;
else
myValue = 0;
}
/**
* Reset the value of the Counter to 0.
*/
public void reset() {
set(0);
}
The set
method takes a parameter, newValue
, which becomes the
new value of the counter, if newValue
is legal. Otherwise, the counter's value
is set to 0. The reset
method just sets the counter's value to 0.
Both C and Python programmers will recognize the legality test
newValue >= 0 && newValue < myLimit
, which runs the two relational
operators >=
and <
and then combines their results with the
short-circuiting logical connective &&
. Savvy Python programmers would
have written this test as 0 <= newValue < myLimit
in Python. Alas, this
style of stringing together relational operators is not allowed in Java. The Java compiler
would interpret 0 <= newValue
as giving a boolean value, which you'd then
be attempting to compare with the int value myLimit
, thereby generating a
compile-time error.
Next is the "getter" method getValue
:
/**
* Return the value of the Counter.
*
* @return the current value of the counter.
*/
public int getValue() {
return myValue;
}
This method returns an int value to its caller, and so the return type in the header is
int
. The Javadoc documentation also includes this information in the
@return
part.
The next method in the Counter
class is toString
:
/**
* Return a String representation with at least 2 digits, padding with a
* leading 0 if necessary.
*
* @return a String representation of the counter with at least 2 digits.
*/
public String toString() {
DecimalFormat fmt = new DecimalFormat(FORMAT); // use at least 2 digits
return fmt.format(myValue);
The toString
method is called automatically whenever we need the string
representation of an object. Notice that it returns a String
. If we don't
supply the toString
method in a class definition and we try to convert the
object to a string, we just get the object's address in memory, in hexadecimal (base 16),
which is rarely useful. Here, the toString
method converts the counter's value
to a character string of at least two digits. It uses the class DecimalFormat
that we imported up at the top of the file. The toString
method first creates
a DecimalFormat
object with the Java keyword new
. The
only way to create a new object in Java is via the keyword new
. The
class variable FORMAT
is passed to the constructor for DecimalFormat
.
The new
operator does three things:
Let's take a look at the variable fmt
. It is a local variable,
which means it exists only during the call to the toString
method. The variable
is created when toString
executes, and it ceases to exist when toString
completes.
Now that you know that the new
operator returns a reference to the object
created, you know what is assigned to fmt
: a reference to the
DecimalFormat
object that new
created and ran a constructor on.
Not surprisingly, then, fmt
is declared as a DecimalFormat
object.
Except that it's not. It's actually declared as a reference to a
DecimalFormat
object. If you've seen aliasing in C or Python, where two
different variables hold the address of the same object, you can appreciate the difference
between an object and a reference to an object.
Having created a DecimalFormat
object and assigning its address to the local
variable fmt
, the toString
method then calls the format
method on this DecimalFormat
object, with myValue
as a parameter
(the only parameter) to the format
method. We don't know how the format
method works, but that's abstraction acting in our favor. Of course, we haven't even said
what the format
method does, but we can worry about that later.
What's important here is how we call a method on an object. We called
fmt.format(myValue)
. Here, fmt
is a reference to a
DecimalFormat
object, format
is a method of the
DecimalFormat
class, and myValue
is the parameter to the call.
The general form of a method call is
objectReference.methodName(parameters)
Python programmers have seen this syntax before, but C programmers have not.
I like to think of objects as "things" and methods as "actions." Indeed, most of the time, the name of a class should be a noun or a noun phrase. And the name of a method should include a verb. (With the obvious exception for constructors, which are required to have the same name as the class.)
Now, what does the format
method return? You can tell from how it's used.
The toString
method just takes what format
returns and immediately
returns it to its caller. And we know that toString
returns a String
or, more precisely, a reference to a String
. Therefore, the format
method must return a reference to a String
.
The last method in the Counter
class has the special name main
:
/**
* A main program to test the counter.
* (Including such testing programs is a good idea.)
*/
public static void main(String args[]) {
// Create variables that can reference two Counters.
Counter c1, c2;
c1 = new Counter(5); // wraps at 5
c2 = new Counter(); // wraps at 12
final int TIMES = 50;
System.out.println("c1\tc2\tsum");
// Show lots of Counter values.
for (int i = 0; i < TIMES; i++) {
System.out.println(c1 + "\t" + c2 + "\t" + (c1.getValue() + c2.getValue()));
// Tick both Counters.
c1.tick();
c2.tick();
}
c1.reset();
c2.reset();
System.out.println("After reset:\t" + c1 + "\t" + c2);
c1.set(4);
c2.set(10);
System.out.println("After set:\t" + c1 + "\t" + c2);
c1.set(5);
c2.set(-1);
System.out.println("After invalid:\t" + c1 + "\t" + c2);
}
The main
method is where program execution starts. In Eclipse, we always have
to say which class contains the main
method in which we want to start executing
the program. So we can have lots of main
methods in an Eclipse project, but only
one will be the main
method.
The header of the main
method is boilerplate:
public static void main(String args[])
. As we'll see, a static method is not
called on any object. The main
method must be static, because it's the
first thing executed, and so no objects can possibly have been created at the time main
starts executing. The parameter String args[]
allows for command-line arguments.
The []
part indicates that args
is actually an array
of references to String
. C programmers have seen arrays. So have Python programmers,
but they're called "lists" in Python.
What does the main
method do? It starts by declaring local variables
c1
and c2
as references to Counter
objects. Then it
creates two Counter
objects. The first one has a limit of 5; the new
operator calls the one-parameter constructor. The second object has a limit of 12; the
new
operator calls the parameterless constructor (which is also called a
default constructor). Remembering that new
returns a reference
to the object it creates, we see that c1
is assigned a reference to the
Counter
whose limit is 5 and c2
is assigned a reference to the
Counter
whose limit is 12.
Next, main
declares a final local variable TIMES
with the value 50.
Then, main
prints some output to the console by calling the method
println
on a built-in object referenced by System.out
.
The println
method takes a string and prints it on the console. As in C and Python,
we can include escaped characters in a string literal; here, \t
denotes the tab
character.
Then comes a for-loop. The syntax is exactly the same as in C, but this syntax will be new for Python programmers. Java for-loops are quite a bit more general than in Python, and they do not require lists (or arrays) to be involved. A for-loop has the form
for (initial; test; update)
body
and it is equivalent to a while-loop with the form
initial
while (test) {
body
update
}
Therefore, we could have written the for-loop as the equivalent while-loop
int i = 0;
while (i < TIMES) {
System.out.println(c1 + "\t" + c2 + "\t" + (c1.getValue() + c2.getValue()));
// Tick both Counters.
c1.tick();
c2.tick();
i++;
}
Although you might find the Python loop header for i in range(TIMES):
to be
easier, the more general form of Java for-loops has advantages that we'll see later on.
The body of the for-loop calls the tick
method on each of the two
Counter
objects, using the objectReference.methodName(parameters) syntax.
After the for-loop terminates, the main
method calls reset
on
each Counter
and then prints the values of the two objects. There's a lot going
on in the parameter to println
, so let's take a look. The parameter to
println
must be a String
(more precisely a reference to a
String
). But we're "adding" the string literal
"After reset:\t"
and the reference c1
. To whatever that
addition gives back, we're adding the string literal "\t"
, and then
we add the reference c2
. What is going on?
Java assumes that if you apply the +
operator and one of the operands is a
String
(again, really a reference to a String
…do you see how a
string literal's type is really "reference to a String
object"?),
then the other operand is converted to a String
if necessary and
concatenated with the String
operand. And so we convert the
Counter
object that c1
references to a String
and
concatenate that String
with the string literal "After reset:\t"
.
Just how is that Counter
object converted to a String
? By an
implicit call to its toString
method! In other words, Java will automatically
call toString
on this Counter
object and use the reference to a
String
that toString
returned. We concatenate the two strings, and
then we concatenate the string literal "\t"
to that result, and then
call toString
implicitly on the Counter
that c2
references, concatenating the string returned by toString
what we've built up so
far. Then we concatenate another tab character, followed by concatenating an int value that is
implicitly converted to a String
. That int value is the sum of the values of the
two Counter
objects, using values returned by calls to getValue
on
each of the two objects. Notice how the parentheses around that last sum makes it so that last
sum is not concatenation, but plain old addition. That ultimate string is what is
passed to println
and printed on the console. Whew!
A word about the toString
method. Remember that it calls the format
method of the DecimalFormat
class on an object that was initialized with a string
of 00
. The format
method in this situation will take its parameter
(the myValue
instance variable of a Counter
) and return its string
representation, but padded with leading zeros if necessary. For example, if myValue
is 7, then format
returns 07
, but if myValue
is 107 then
format
returns 107
.
From here on, the main
method is easy. It sets the value of the Counter
referenced by c1
to 4 and the value of the Counter
referenced by
c2
to 10, and then it prints out the values again. Then, main
sets
the value of the Counter
referenced by c1
to 5, and it attempts to
set the value of the Counter
referenced by c2
to -1. But -1 is
not a legal value for any counter, and so the value 0 will be used instead. Finally,
main
prints the new values of the two Counter
objects.
I said that abstraction is an important concept. How does it play into this example? Because
all the instance variables of a Counter
object are private, no code from outside
the Counter
class can access them. The only way to interact with a
Counter
is by calling its methods. The caller knows what the methods
do, but not how it does them. Granted, the implementations of the tick
,
set
, reset
, and getValue
methods are pretty obvious,
but the toString
method could have been implemented in several different ways.
The caller cares not. As long as the methods do what they're supposed to do, the caller should
be happy.