Welcome to CS50! Ready to hack ...
In this lecture, we will discuss the aim and schedule of the course, and take a brief look Unix, C compilation and computer architecture.
OK. Let’s get started.
The aim of this course is to develop the necessary systems programming skills in C and Linux as a foundation to tackle the design, implementation, and integration of a large software project working in small teams. The challenge of the course is to quickly get people up to speed so there is sufficient time to get into the details of a complex software design project. The first part of the course serves to develop, design, programming and other systems skills such as source code management, testing, and debugging. The second part of the course is all about the project and team work. Good team work will lead to success. That’s the message.
The syllabus in a nutshell:
The course includes weekly programming assignments for the first part of the course. The last part (approx. 2 weeks) is held over for project work. There are no lectures in the last part of the course but the projects are run with a formal design review, code review, and demo as well as periodic progress meetings where the team can brainstorm problems and come up with solutions. There will be a common project goal for all teams based on a “robotic treasure hunt” but students are free to develop their own ideas beyond this common goal - we want you to be entrepreneurial.
There is a significant amount of programming in this course requiring a significant time commitment on the part of the student. You will need to be well organized to complete all the programming assignments and project. It may not be all plain sailing but we hope it will be fun - you will certainly learn a set of new skills that will be very useful in the software industry.
There is no midertem or final exam in this class :) There are, however, a number of challenging programming exercises, infomative reading assignments, and exciting group projects.
The grading for the course is as follows:
10% - Class and Lab contribution. Active involvement in class and lab discussions will help toward this part of the grade. Being a good CS50 citizen.
60% - Laboratory exercises. There are 7 weekly laboratory assignments over the first 8 weeks. These labs are designed to help you learn the languages, tools, and design skills you will need for your final project. Only six labs are graded - 10% for each lab. Some labs are much harder than others but we have a flat grading scheme across all labs. These assignments are to be done individually. The schedule is online - plan a head.
30% - Team project. The project is made up of a small team (three students) and requires strong collaboration and a problem solving mindset to get the job done. The instructor will put together the teams (to balance skill sets) with each member being responsible to deliver against a part of the overall system design, implementation, testing, and integration. The goal of this activity is to help you develop the confidence, skills, and habits necessary to write large computer programs while part of a multi-person team. You will become conversant in software engineering paradigms, and be exposed to various public-domain and open source tools that make the software development process easier. In addition, you will develop vital skills in self-directed learning, problem solving, and communication. The project will have a design and code review as well as the demo. A project report that captures the design and implementation will be submitted as part of the assessment. The project report will be written using a text editor, the LaTeX language, and Linux latex commands.
See the course webpage for other details on the course. Please check the webpage frequently for updates. The webpage is http://www.cs.dartmouth.edu/ campbell/cs50/
Please also check out the late submission policy on the webpage. We provide two free passes for 48 hour extensions with no penalty, if needed. But try and not use them if possible since you may fall behind. But they are there if you really need them; that’s the message.
We plan to learn the following in today’s lecture:
In the first four lectures we will cover linux, the shell, and shell programming. We will also cover a few advanced topics (e.g., processes, sockets, threads) that we will come across while programming. This is not meant to be a detailed presentation of the Linux OS and its programming tools. No, we’d need a complete course to cover that material. We need to know enough about Linux and its tools to be able to navigate our way around the system, write some basic shell scripts, and use its programming tools.
It is important that you use these notes as a starting point and like any budding hacker you need to do some experimenting and get online and read up on the details. You need to go on the web and find information if you have gaps in your knowledge then come see the instructor for help. There are a number of references dotted through the notes and at the end of the notes to get more detailed information.
In this lecture, we plan to cover what will become a familiar process to you: logging on to a Linux machine, writing a C program, running it, and logging off. We’ll also delve into the compilation process and discuss the program from its C origins to an executable running on a microprocessor.
Caveat: Please take note that lecture notes will not always be detailed. You will need to augment these notes with your own comments and by reading the references and reading assignments so you can dive deeper into the topic.
OK. Let’s get started.
First, let’s log on to one of the Linux machines in the Sudikoff Lab 001. You can take a tour of the lab and its Linux machines: Sudikoff Lab 001 tour.
I’ll login in from my Mac using the Secure SHell (ssh) linux command line to remotely log into “spruce” (Linux machine) or to give it its full IP name - spruce.cs.dartmouth.edu. The ssh command replaces telnet (remote communications with another computer) and rlogin (remote login) because they lack security. The ssh command is mainly used these days because your password gets encrypted when it’s transmitted over the network, not sent in clear text.
I’ll log on using the standard from the Mac OSX terminal application - I’m a Mac person, which means Unix under the hood, sweet! There are two ways to remotely log on to a machine using the standard term on my Mac or using an xterm (X Window terminal). The nice thing about using an xterm (and therefore a underlying piece of code called xwindows) is that it makes remote apps look like they are local. I’ll show you how to access a remote machine using a term and xterm and you can decide which you prefer. I like the standard term.
Using the standard term:
Using an xterm:
SSH provides a secure way for a user to access a remote computer and run commands on that computer. There are a large number of linux commands, 100s. We will learn a small set of commands that will be useful for the course and project. Commands are entered directly at the console or remote terminal (terminal, xterm, puTTY - we will talk about this later today). Each Linux command has a short abbreviated command (e.g., Secure SHell (ssh)) and its associated syntax that typically includes various arguments, and options/switches; typically, these options (or switches as they are also known) are single letter preceded by a hyphen (e.g., -Y). For example,
The switch“ -Y” enabled trusted X11 forwarding from the remote machine. The other switch “-l” informs the ssh command that the username of the user logging in is campbell. SSH established a secure channel and uses public-key cryptography to authenticate the remote computer and the user.
From now on assume the notes assume in using a standard term (not xterm).
If you want the detailed syntax of a linux command you can use the manual command followed by the command:
This is just a snippet of the man ssh output (man is short for manual). The manual output includes all the nitty gritty details on options and about the command. For most commands you can use the common option “–help” (two hypens) to get a brief breakdown of the command and its switches. This doesn’t work for all commands but in that case the use of –help is processed as an invalid entry by the shell and lists of the options - so same results.
BTW, the shell is a very nifty program and acts as the command interpreter for Linux.
The online documentation for linux (commonly called the Linux manuals) are divided in to a number of sections that you can specify in the command line for “man”. These include:
The manuals locate information by searching in the order:
1, n, l, 6, 8, 2, 3, 4, 5 and 7.
There are different levels of information associated with a searched item depending on its context. For example, information on sockets can be found in system calls (man2) and in the networking section (mann). Similarly, there is manual information on the wait command as a standard utility (aka command) and system call. Selecting a section depends on what you are looking for. You can use “man -k keyword” to search through the manual pages for matches on a keyword.
Each user has a home directory. After you have logged in using ssh you are in your home directory.
We can look at our home directory “path” using the pwd (print working directory) command. Use the man and info commands to get information if you like.
Let’s take a look at the contents of my home directory (using the -l switch which means long format):
The linux model for files is simply a linear stream of bytes. Files can be plain files, directories or special files (we will talk about this in the next lecture). We can see that each file has file permissions and other data associated with it; for example:
drwxr-xr-x 4 campbell ug 728 Oct 26 2006 public_html
You can traverse directory trees assuming you have the appropriate permission.
Linux supports a number of shells (command line interpreters). If we use the echo command we can look at the environment variable that tells us which shell is running. For this course we will use the bash (Born Again SHell) shell.
Again, the shell is the command line interpreter in Linux.
More on the bash shell later.
SSH is a cool command because it allows you to execute remote commands if you don’t want to login to the remote computer. If you wanted to execute a Linux command such as “ls -l” then do this
If you use ssh and a command is specified, it is executed on the remote host instead of a login shell; its the equivalent to logging in, executing ls -1, and then logging out - all on one command line - isn’t that cool!
Another useful command for copying files between machines is the scp - secure copy (remote file copy program) command.
I’ll remove the class notes for today’s lecture which are stored remotely on the Linux system. Then I’ll put them back from my laptop using scp.
That file is a little light on content!
You can install lots of fancy programs to provide nice GUIs that use these basic commands such as puTTY (for Window machines), or MacSSHPPC, NiftyTelnet, or Fugu (my favorite) but its nice to see what’s under the hood - which, in essence, is the command line.
Assuming you use xterm to log in: The nice thing about using (xterm aka X11) and the “-Y” switch is that you can use applications on the remote machine as if they were running on your local machine (e.g., laptop, desktop, PDA). For example, if you have X11 running on your local machine (more on this later) then try running a remote app over the ssh session: e.g.,
If you wait a tad you should see the firefox app run.
OK, let’s log back in using ssh but from a standard terminal (using the OSX Terminal application - it looks very much like an xterm, but it’s different). No need for the -Y switch because this is not X11 now.
First, we need to write the source code by opening and editing a file.
We would like you to use a text editor to write code and documentation - the emacs editor is a good editor for developing software and supports context sensitive editing for C, shell programming, and latex (you will be using latex to write your report). If you know “vi” that is fine too. If you are not familiar with emacs checkout this short Emacs Tutorial . If you open the emacs editor (as below) and look under “help” you will find an bundled tutorial.
In this course we stick to the command line as much as possible. We are not using an IDE (Integrated Development Environment) such as the Eclipse IDE in the first part of the course. Later, we may think about using Eclipse during the project phase but for now, no IDEs, we want to run all the tools from the command line interface. Once you know what’s under the hood then we can hide the details and move up the abstraction tool chain to an IDE (note, you can’t become a good architect until you know what bricks and mortar are and how they get used, right? Same thing here.)
So let’s open the emacs editor and write the code:
Here is your first program:
OK. now we have written our first C program lets compile and execute it. We will use the GNU (aka GNU is Not Unix) tool chain and its gcc compiler. We will make better use of the gcc compiler later in the course but for know the compiler produces an “executable” called a.out (meaning an assembler output file). a.out is the default output executable created by gcc when no output name is specified in the command line, as is the case below.
C is a “not so high level language” and can be coded to take advantage of the microprocessor it executes on in order to write high performance code (e.g., C’s register language support). In this Java and OO world we live in we can easily forget about the underlying processor. Java abstracts you away from the processor and the OS which is good and bad.
So let’s take a quick peak a the compilation process - the code chain. There are three phases to producing the executable: the compiler (man gcc), the assembler (man as), and the linker (man ld). You can use the “‘man gcc” command or type “info gcc” (GNU info system) for all the details on the compiler. The GNU gcc compiler supports a large number of microprocessor families including the Intel “x86” family which is commonly used on Linux machines.
Let’s play with the assembler by getting gcc to produce assembly code from the hello.c example. We will look at the assembler code. Then get the assembler program to take the assembler code and produce an executable object code - also called machine code, the binary code that the processor executes. The microprocessor doesn’t execute assembler, C, shell script, Java, etc. - only machine code. So we better translate the program into the execution language that the “machine” can understand. We use the “-S” option when compiling hello.c This gets gcc to translate the C code found in hello.c into assembler code in hello.s.
Let’s take a look at the code. We won’t analyze it in detail but we’ll make some observations.
GCC translates hello.c (C code) into assembler code in hello.s. Lets take a look at the assembler code. We won’t analyze it in detail. You do not need to understand every line of code to see what is happening here in the Intel assembly language. The code is broken into sections using the .section command. The .string (in the data section of the code) holds the sting to be written out to the display. The global function name “main” is visible in the .text section (code section). The GNU gcc compiler is identified (GNU 4.0.2 and the Linux version). There is some work on the stack on entry and exit to the main function.
If you want to know more about the x86, stack frames, etc., then check out this short note on Intel x86 function-call conventions - assembly vconfiiew.
The assembler removes all comments from the hello.c code. The comments in the code were added not by the assembler but the instructor, just to add a little context to the code. We can see how hello.c and hello.s relate or translate. Note, that the “puts” library code is not part of this assembly code. It is added in the next phase of the compilation. The “linker” pulls in the necessary libraries that are needed to create the machine executable code. In essence, the GNU linker pulls in the standard libraries, resolves the call to the puts function by linking in the real object code for that function among other things.
The linker “ld” takes the input of the hello.s (the assembler code) and produces a runnable executeable file (machine code).
The linker creates an executable in a container format understood by the target Linux system. The Executable and Linking Format (ELF) is used by the GNUlinker to produce the executable object file. Typically, programmers don’t call the linker directly. GCC takes care of that through pipelining the code through the various compilation stages.
The output of the objdump tool is designed to make it easier to understand the contents of an executable. You can use the objdump “od -x a.out” command to simply dump the object code in hexadecimal (the “-x” switch) form. If you try and look at the executable using emacs you will not be able to understand its binary representation (try it). In contrast, objdump displays all the headers within the a.out binary. It essentially performs a reverse engineering job (dissembler) on the binary code (meaning the 1s and 0s of the machine code) - disassembling the executable sections of the binary code.
The output of the objdump tool looks like this:
OK, we are ready to logout from our session on spruce.cs.dartmouth.edu.
It’s been a good start. We have covered a number of important issues that we will revisit in the course.
Follow the pseudo code:
Do you have a department linux account? If you do that is great you can use that account for this course. If you don’t have an account make sure you give me your name and two preferences for a linux login name (8 characters or less, all lower case alphanumeric and - no _ other punctuation). Wayne will set up the accounts within 24 hours. He will blitz you your new account information with an initial password. You can change this initial password using the Linux command $passwd (man passwd). To do that you will have to ssh onto galehead.cs.dartmouth.edu and type passwd.
How may asynchronous tasks (i.e., processes) are there? What does each task do? How do they communicate?
In order to obtain access to Sudikoff after hours, and to get into Sudikoff’s Lab 001, you will need to have your Dartmouth ID card activated for the appropriate access. To do this, stop by 101 Sudikoff on a weekday between 8:30am-12:00pm, or 1:00-4:00pm, and bring your Dartmouth ID card. Inform the staff that you are taking CS50, and require access to Lab 001. You will have to fill out and sign a form stating that you understand the various policies about access to the labs in Sudikoff.
Keep in mind that it may take 24 hours for access to be activated, so please plan ahead!
Please note that the exterior doors of Sudikoff are automatically locked after 6:00pm weekdays, and also every weekend and holiday. In addition, the laboratory doors are locked at all times. You will need your access card to pass through locked doors
If you have a Linux laptop you are all set. Mac comes with Linux under-the-hood so development is easy. For Windows you can install a VM. See below for Windows and Mac users.
Window Students wishing to connect to CS Linux machines from their Windows laptop can use PuTTY and other x servers for winows. See the following PuTTY tutorial
The tutorial also covers Cygwin which is a Linux-like environment for Windows that consists of a DLL (cygwin1.dll) that offers a Linux API emulation layer. Cygwin is not Linux and while it is convenient to run C and Linux like commands on your laptop you would be better to run your code on the Linux machines directly using PuTYY.
You can also install virtual machines on your windows box. For Windows 8 users use the free Virtual Box software to a install linux virtual machine. We have run the lab solutions on the virtual machine and it looks fine. For other Windows version users install Ubuntu linux. Today’s Ubuntu linux installer is very easy to use. Just double click the installer in Windows and you will have a windows-linux dual-boot environment. Simple and cool.
Mac Mac OSX Unix conforms to POSIX specifications for the C API, shell utilities, and threads and can compile and run your existing code. This is really exciting for Unix/Linux development.
You can also use the ssh command to remotely log on to computers, as discussed in the lecture. You can use the Terminal application. You have to install the gcc toolchain on your mac to develop c code. It is simple. You first need to install xcode if you have not already (is so skip step 1) and then enable gcc using xcode .
If your Linux account is set up by the X-hour then do the following. Read through these lecture notes and execute the commands and code as best you can - write a simple program using the emacs editor (read the emacs tutorial). Check out the gcc options to look at the assembler code.
For information about the lab and the Linux machines see: Sudikoff Lab 001 tour.
We plan to show you a demo of one of the robots, so you can get a feel for the project.
Make sure you do the reading for the next class Typically we have reading for Wednesday and Friday classes.
Go through the material in these lecture notes: execute all the commands; write the hello.c program using an editor like emacs; run through the compilation process and look at the assembler code hello.s and use objdump to view the machine code. We will be handing out Lab1 next class so get this done before then.
Here are some useful links cited in the notes. Please read them. I’ll only put links to material you need to read.
Intel x86 Function-call Conventions - Assembly View