CS 50 Software Design and Implementation

Lecture 2

The Linux Shell and Commands

From the last lecture we learnt to write our first C program and looked at the compiler (gcc) code chain. In this lecture, we will discuss the Linux shell and its commands. The shell is a command line interpreter and invokes kernel level commands. It also can be used as a scripting language to design your own utilities. We will discuss scripting as part a future lecture on shell programming.

Since we do not recommend you buy a Linux book here are some very good references and free access online books - see resources.

If you need help on the meaning or syntax of any unix shell command you can use the manual (man) pages or the web unix commands.

Goals

We plan to learn the following from today’s lecture:

OK. Let’s get started.

The shell

Commands, switches, arguments

The shell is the Linux command line interpreter. It provides an interface between the user and the kernel and executes programs called commands. For example, if a user enters ls then the shell executes the ls command. The shell can also execute other programs such as applications, scripts, and user programs (e.g., written in c or the shell programming language).

You will get by in the course by becoming familiar with a subset of the Linux commands.

Linux has often been criticized for being very terse (it’s rumored that its designers were bad typists). Many commands have short, cryptic names - vowels are a rarity:

      awk, cat, cp, cd, chmod, grep, find, ls, mv, ps, rm

We will learn to use all of these commands and more.

Linux command output is also very terse - the default action on success is silence. Only errors are reported. Linux strongly supports the philosophy of one and only one way to perform each task. Linux commands are often termed tools or utilities - familiarity with the “top 20” will be a great help.

Instructions entered in response to the shell prompt have the following syntax:

       command [arg1] [arg2] .. [argn]

The brackets [] indicate that the arguments are optional. Many commands can be executed with or without arguments. Others require arguments (e.g., cp sort.c anothersort.c) to work correctly, if not, they will provide some error status message in return. But to avoid an explosion in the number of commands most commands support switches (i.e., options) to modify the actions of the commands.

For example, lets use the ls command and the -l option switch to list in long format the file c.tex. Switches are often single characters preceded by a hyphen (e.g., -l). Most commands accept switches in any order though they must appear before all “true” arguments (usually filenames). In the case of the ls example below the command arguments represent [options] filenames[s], as shown below. Options therefore modify the operation of the command and are operated on by the program called by the shell and not the shell itself.

In fact, the command is argument 0 (ls), -l switch or option is argument 1 and the filename is argument 2. Some commands also accept their switches grouped together. For example, the following switches to ls are identical:

[campbell@galehead lectures]$ ls -rtl  
...  
[campbell@galehead lectures]$ ls -l -r - t

The shell parses the words or tokens (commandname , options, filesnames[s]) and gets the kernel to execute the commands assuming the syntax is good.

Typically, the shell processes the complete line after a carriage return (cr) (carriage return) is entered and finds the program that the command line wants executing. The shell looks for the command to execute either in the specified directory if given (./mycommnd) or it searches through a list of directories depending on your $PATH variable.

Path environment

You will need to look at your $PATH variable and update it from time to time to make sure the path to the code you want to execute is there. Typically, your login files that execute when you log in (.bash˙profile) or each time to execute a command (.bashrc) are the place to set up these environment variables such as $PATH.

echo $PATH  
/sw/bin:/sw/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/texbin:/sw/bin:/usr/X11R6/bin

So where does the ls command executed above reside in the Linux directory hierarchy. Let’s use another command to find out.

[atc@Macintosh-7 atc]$ which ls  
ls is /bin/ls  
[atc@Macintosh-7 atc]$ whereis ls  
/bin/ls  
[atc@Macintosh-7 atc]$

The whereis or which commands are a useful sanity check if you want to know for sure which ls command is executed. For example, I could have written a program called ls and placed it in my working directory - probably not a good idea but could happen. In that case, if I entered ls - which command would the shell execute?

So we can see from the $PATH variable that /bin is in the path. Hence the shell can track down the ls binary to execute. The fact that the ls command is in /bin assumes that the filename /bin/ls has the correct permission set to be an executable by all. We can check this.

[atc@Macintosh-7 atc]$ ls -l /bin/ls  
-r-xr-xr-x   1 root  wheel  60972 Oct 17  2006 /bin/ls

Indeed it is. The file is owned by “root” and is executable by all.

If I set my $PATH variable to “.” only the current working directory and execute ls (this would be akin to not having my path name set up correctly) then the shell would not be able to find the correct program to execute (assuming I don’t have an ls binary with the correct permissions in my current directory). Here is what would happen.

[atc@Macintosh-7 teaching]$ which ls  
ls is /bin/ls  
[atc@Macintosh-7 teaching]$ PATH=.  
[atc@Macintosh-7 teaching]$ ls  
-bash: ls: command not found  
[atc@Macintosh-7 teaching]$ which ls  
-bash: type: ls: not found  
[atc@Macintosh-7 teaching]$

You may come across this error (“-bash: ls: command not found” or variants thereof) when you install code or try and execute new programs you want to use or have written.

There are shells, shells and more shells

There are a number of shells available to a Linux user - so which one do you select? These are:

Changing the Shell to bash

We will be using the bash shell in this course since it is standard fare with Linux machines. Let’s check what shell has been configured for your login account. If it’s not the bash shell then let’s change the shell using the change shell chsh command. To find out what shell is running, log in and look at the $SHELL environment variable. We will use the echo command which is akin to the print (e.g., printf in C) statement in many languages.

[campbell@spruce ~]$ echo $SHELL  
/bin/tcsh  
 
[campbell@spruce ~]$ chsh  
please login to galehead and run /usr/bin/chsh from there.  
 
[campbell@spruce ~]$ ssh galehead  
campbell@galehead’s password:  
Last login: Sun Dec 23 22:58:58 2007 from spruce.cs.dartmouth.edu  
 
[campbell@galehead ~]$ chsh -l  
/bin/bash  
/bin/sh  
/bin/ash  
/bin/bsh  
/bin/tcsh  
/bin/csh  
/usr/local/bin/bash  
/usr/local/bin/tcsh  
/bin/bash2  
/bin/zsh

The tcsh shell is set. So we need to change the shell to bash. Note, that galehead is the only machine that you can change user configuration information on such as passwords and shells. So we ssh to galehead and list all permissible shells supported by our local Linux machines using the “chsh -l” switch.

Now, let’s change the shell to bash. Note, that our first attempt fails because the shell wants the full path name. We will discuss full and relative path names later in this lecture. Our second attempt using the full path name from above is successful.

[campbell@galehead ~]$ chsh -s bash  
Changing shell for campbell.  
Password:  
chsh: shell must be a full path name.  
 
[campbell@galehead ~]$ chsh -s /bin/bash  
Changing shell for campbell.  
Password:  
Shell changed.  
 
[campbell@galehead ~]$ ps  
  PID TTY          TIME CMD  
22271 pts/1    00:00:00 bash  
22345 pts/1    00:00:00 ps

Another way to see what shell is running is to use the process status (ps) command. We can see that the bash shell is running. The process ID and other status information of the process is displayed.

Note, that most commands executed by the shell starts a new “process”. (There is an exception to for what are called builtins). We will discuss processes in a future lecture.

The basic shell operation is as follows. The shell parses the command line and finds the program to execute. It passes any options and arguments to the program as part of a new process for the command such as ps above. While the process is running ps above the shell waits for the process to complete. The shell is in a sleep state. When the program completes it passes its exit status back to the shell. At this point the shell wakes up and prompts the user for another command. Note, that it is the program that is executed, for example, ps in this case, that checks to see if the arguments passed to it by the shell are correct or not. It is not the job of the shell to do that level of parsing or error checking. If there are problems with the syntax (e.g., wrong switch) then it is the program itself that informs the user via an error message.

A note on shell builtins

The term command and utility are used synonymous in these notes. The shell has a number of utilities built into the shell called builtins. When a builtin runs the shell does not fork a process; that is, it does not create a process specifically to execute the command. Therefore, the builtins run more efficiently in the context of the existing process rather than having the cost of creating new processes to run the command. Typically, users are not aware if a command runs as a builtin or a standard forked command. The echo command exists as a builtin to the shell and as a separate utility in /bin/echo. As a rule the shell will always execute a builtin before trying to find a command of the same name to fork. Bash supports a number of builtins including bg, fg, cd, kill, pwd, read among others.

Linux file system

The Linux file system is a hierarchical file system. The file system consists of a very small number of different file types. These include text files, directories, character special files (e.g., terminals) and block special files (e.g., disks and tapes).

A directory is just a special type of file. A directory (akin to a Macintosh folder) contains the names and locations of all files and directories below it. A directory always contains two special files ’.’ (dot) and ’..’ (dot dot). Every file has a filename of up to 1024 characters typically from ’A-Z a-z 0-9 ˙ .’ and an inode which uniquely identifies the file in the file system.

Directory names are separated by a slash ’/’, forming pathnames.

/usr/bin/emacs  
/etc/passwd

Files are accessed by referring to their relative or absolute pathnames.

Home directory

Each account has a home directory. After you have logged in your shell will be executing in your home directory. So let’s log in and use the pwd command to find out where we are - we will be in the home directory of course.

[atc@Macintosh-7 l2]$ ssh -Y -l campbell spruce.cs.dartmouth.edu  
campbell@spruce.cs.dartmouth.edu’s password:  
Last login: Mon Dec 24 11:37:01 2007 from c-75-69-130-98.hsd1.nh.comcast.net  
 
[campbell@spruce ~]$ pwd  
/net/nusers/campbell

Let’s list the contents of the home directory.

[campbell@spruce ~]$ ls -l  
total 434  
drwx------  2 campbell faculty     48 Dec 22 15:29 bin  
drwxr--r--  5 campbell faculty    128 Dec 24 14:33 cs23  
drwx------  2 campbell faculty     48 Dec 22 15:30 lib  
drwx------  3 campbell faculty   1368 Dec 24 11:25 mail  
drwx------  3 campbell faculty    104 Nov  6 12:01 papers  
drwxr-xr-x  4 campbell ug         728 Oct 26  2006 public_html  
-rw-------  1 campbell faculty 435438 Dec 14  2006 Sent  
-rw-------  1 campbell faculty   1017 Mar 22  2007 Sent Messages  
drwx------  3 campbell faculty     72 Dec 11 15:14 teaching

Recall the “d” in “drwx——” indicates that this file is in fact a directory. So we can move to that directory assuming we have the relevant permission - which we do in all cases. So lets move around.

File and directory permission

All files and directories have certain access permissions which constrain access to only those users having the correct permission. Let’s consider a couple of typical examples from above:

drwxr--r--  5 campbell faculty    128 Dec 24 14:33 cs50  
 
-rw-------  1 campbell faculty   1017 Mar 22  2007 Sent Messages

The first character of the access permissions indicate what “type of file” is being displayed.

Following the first type of file character the next 3 triples (i.e., groups of three characters) from left to right represent file permissions: the read, write, and execute permissions, for (respectively) the owner (campbell in this case), the files’s group (faculty in this case), and the “rest-of-the-world”. To determine that group particular files are in enter the “ls -lg” command.

What do these permissions mean?

After the file permission comes the number of links to the file (e.g., 5), followed b the owner (campbell), group (faculty), size (e.g., 128) which represents the size of the file in bytes, date and time of modification (e.g., Dec 24 14:33), and the filename (e.g., cs50).

Note, that shellscripts (which we will discuss in a future lecture) must have both read and execute permission - bash or any of the shells must both be able to read the shellscript and execute it. Program binaries on the other hand do not need to be read and only need execution permission since they are not read but executed (recall when we tried to more a.out we could not view it because it was an executable in machine code).

Changing permission

The permissions on files and directories may be changed with the chmod (change mode) command. When you own a file or directory you can use chmod to change the access permissions to the file or directory.

Only the three permission triplets may be changed - a directory cannot be changed into a plain file nor vice-versa.

Permissions given to chmod are either absolute or relative (i.e., symbolic).

Each triplet is the sum of the octal digits 4, 2, 1, and read from left to right. For example rwx is represented by 7, rw- by 6, and r– by 4 and so on. The absolute octal values used with chmod are as follows:


Octal Value           Protection mechanism
   400                   Read by owner
   200                   Write (delete) by owner
   100                   Execute (search in directory) by owner
   040                   Read by group
   020                   Write (delete) by group
   010                   Execute (search) by group
   004                   Read by others (i.e., rest of the world)
   002                   Write (delete) by others (dangerous!)
   001                   Execute (search) by others

The complete permission on any file is the sum of the values. For example, home directories are typically 700 which provides the owner with read, write, and execute permission but denies all access to others.


drwxr-xr-x 6 campbell faculty    152 Dec 31 20:40 cs50
...
[campbell@galehead ~]$ chmod 700 cs50
...
drwx------ 6 campbell faculty    152 Dec 31 20:40 cs50

If you wish others to read your files set - in this case a file funny - then the command would be:

[campbell@galehead ~]$ chmod 664 funny  
...  
-rw-rw-r-- 1 campbell faculty      0 Jan  1 15:50 funny

Because the file is only to be read, not written too, and the fact that it’s a file with no execution (not a binary or shellscript) 644 makes good sense.

Use the manual pages to read how chmod can be used in a relative or symbolic mode; for example, what would

[atc@Macintosh-7 notes]$ chmod u=wrx,og-rwx cs50

These symbolic arguments need to be used carefully. Here “o” means others and “u” means owner or user.

“chmod o+wrx cs50” do?

Moving around the file system

The change directory command (cd) allows us to move around the Linux directory hierarchy. Let’s combine pwd, ls, and cd to move around the my local directories that are rooted at /net/nusers/campbell

[campbell@spruce ~]$ cd cs50  
 
[campbell@spruce cs50]$ ls  
assignments  code  lectures  
 
[campbell@spruce cs50]$ pwd  
/net/nusers/campbell/cs50  
 
[campbell@spruce cs50]$ cd lectures/  
 
[campbell@spruce lectures]$ ls  
bash-programming.tex  design.tex         se.tex     start.tex  
c.tex                 linux-advance.tex  shell.tex  
 
[campbell@spruce lectures]$ pwd  
/net/nusers/campbell/cs50/lectures

There are also a number of special characters that can be used with cd for short hand.

Moving to the parent directory:

[campbell@spruce lectures]$ cd ..  
 
[campbell@spruce cs50]$ pwd  
/net/nusers/campbell/cs50

Moving back to where we came from:

[campbell@spruce cs50]$ cd -  
/net/nusers/campbell/cs50/lectures  
 
[campbell@spruce lectures]$ pwd  
/net/nusers/campbell/cs50/lectures

Moving to our home directory:

[campbell@spruce lectures]$ cd ~  
 
[campbell@spruce ~]$ pwd  
/net/nusers/campbell

and back:

[campbell@spruce ~]$ cd -  
/net/nusers/campbell/cs50/lectures

Listing and globbing files

campbell@spruce lectures]$ ls -l  
total 0  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:22 bash-programming.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:22 c.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:23 design.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:22 linux-advance.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:23 se.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:21 shell.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:21 start.tex  
[campbell@spruce lectures]$

Here are a popular set of switches you can use with ls:

-l list in long format (as above)  
-t sort by modification time (latest first)  
-a list all entries (including ’dot’ files)  
-r list in reverse order (alphabetical or time)  
-R list the directory and its subdirectories recursively

We can use a number of special characters to look at the files in a smart and efficient manner:

[campbell@spruce lectures]$ ls -l s*  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:23 se.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:21 shell.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:21 start.tex  
[campbell@spruce lectures]$

[campbell@spruce lectures]$ ls -l *s*  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:22 bash-programming.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:23 design.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:23 se.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:21 shell.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:21 start.tex  
[campbell@spruce lectures]$

[campbell@spruce lectures]$ ls -l *.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:22 bash-programming.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:22 c.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:23 design.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:22 linux-advance.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:23 se.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:21 shell.tex  
-rw-r--r--  1 campbell faculty 0 Dec 24 12:21 start.tex  
[campbell@spruce lectures]$  

Some cool tricks - an example

If I wanted to list just the directories or just plain files (i.e., non directory files) in a directory how would I do that? Use ls, right. Sorry, ls does not have an option to list only directories or just plain files. But we can use a combination of commands to do this!

We can write our own commands to do this job - we can use a combination of ls and grep to list directory names only or plain file names only.

First let’s just list all the files in the home directory - it includes one plain file and the rest are directories.

 
[campbell@bear ~]$ cd ~  
[campbell@bear ~]$ ls -l  
total 40  
drwx------  2 campbell faculty 4096 2007-12-22 15:29 bin  
drwx------ 15 campbell faculty 4096 2010-01-03 21:16 cs50  
drwx------  5 campbell faculty 4096 2008-05-11 15:18 cs60  
drwx------  2 campbell faculty 4096 2007-12-22 15:30 lib  
drwx------  3 campbell faculty 4096 2009-12-29 08:12 mail  
drwx------  2 campbell faculty 4096 2009-06-23 02:43 Mail  
drwx------  3 campbell faculty 4096 2007-12-24 23:23 papers  
-rw-r--r--  1 campbell faculty    0 2010-01-04 21:43 plainfile  
drwxr-xr-x  6 campbell ug      4096 2009-12-02 12:41 public_html  
drwx------  3 campbell faculty 4096 2007-12-11 15:14 teaching  
drwxr-xr-x  2 campbell faculty 4096 2009-04-08 09:48 trash  

Now we use a combination of ls and grep and the pipe command. More on this is in a later lecture but now we begin to see the power of the shell.

Let’s just list plain files:

 
[campbell@bear ~]$ ls -l | grep -v ^d  
total 40  
-rw-r--r--  1 campbell faculty    0 2010-01-04 21:43 plainfile  

Now, let’s use a modification on the above to just list directories:

[campbell@bear ~]$ ls -l | grep ^d  
drwx------  2 campbell faculty 4096 2007-12-22 15:29 bin  
drwx------ 15 campbell faculty 4096 2010-01-03 21:16 cs50  
drwx------  5 campbell faculty 4096 2008-05-11 15:18 cs60  
drwx------  2 campbell faculty 4096 2007-12-22 15:30 lib  
drwx------  3 campbell faculty 4096 2009-12-29 08:12 mail  
drwx------  2 campbell faculty 4096 2009-06-23 02:43 Mail  
drwx------  3 campbell faculty 4096 2007-12-24 23:23 papers  
drwxr-xr-x  6 campbell ug      4096 2009-12-02 12:41 public_html  
drwx------  3 campbell faculty 4096 2007-12-11 15:14 teaching  
drwxr-xr-x  2 campbell faculty 4096 2009-04-08 09:48 trash  

If you don’t know any of the above swicthes then use the man command. We can also use the -F switch to show which file is a directory or not. Check it out.

[campbell@bear ~]$ ls -lF  
total 140  
drwx------  2 campbell faculty  4096 Dec 22  2007 bin/  
drwx------ 31 campbell faculty  4096 Jan 16  2013 cs50/  
drwx------  5 campbell faculty  4096 Apr 19  2012 cs60/  
-rw-------  1 campbell faculty   532 Jun 22  2011 Drafts  
drwx------  2 campbell faculty  4096 Dec 22  2007 lib/  
drwx------  3 campbell faculty  4096 May 19  2013 mail/  
drwx------  2 campbell faculty  4096 Jun 23  2009 Mail/  
drwx------  3 campbell faculty  4096 Jan  2  2012 misc/  
-rwxrwxrwx  1 campbell faculty     0 Jan  8 10:32 myls*  
drwx------  3 campbell faculty  4096 Dec 24  2007 papers/  
drwxr-xr-x  8 campbell ug      12288 Jan  8 16:55 public_html/  
-rw-------  1 campbell faculty 81979 Jun 22  2011 Sent Messages  
drwx------  2 campbell faculty  4096 Jan  8 10:55 solutions/  
drwx------  3 campbell faculty  4096 Dec 11  2007 teaching/  

It is handy to be able to list just the directories when moving around the file system. So we’ll add these commands to our bash files in the next lecture - we’ll create aliases of these commands so we can use them any time.

We’ll there is a ls option to list directories and indeed there are many ways to do this; for example:

 
[campbell@bear ~]$ ls -d */  
bin/  cs50/  cs60/  lib/  mail/  Mail/  papers/  public_html/  teaching/  trash/  
 
[campbell@bear ~]$ echo */  
bin/ cs50/ cs60/ lib/ mail/ Mail/ papers/ public_html/ teaching/ trash/  

Creating and deleting directories and files

In the following sequence we will create a new directory, create two new files (using touch), move one file to another directory, delete the other file and remove the directory.

 
[campbell@spruce cs50]$ pwd  
/net/nusers/campbell/cs50  
 
[campbell@spruce cs50]$ mkdir project  
 
[campbell@spruce cs50]$ cd project  
 
[campbell@spruce project]$ touch socket.c transport.c  
 
[campbell@spruce project]$ ls  
socket.c     transport.c  
 
[campbell@spruce project]$ mv transport.c ~/.  
 
[campbell@spruce cs50]$ alias rm  
alias rm=’rm -i’  
 
[campbell@spruce project]$ alias rm=rm  
 
[campbell@spruce project]$ rm socket.c  
 
[campbell@spruce project]$ ls  
 
[campbell@spruce project]$ cd ..  
 
[campbell@spruce cs50]$ ls  
assignments  code  lectures  project  
 
[campbell@spruce cs50]$ rmdir project

In the sequence above we reset the alias for rm which is set up in .bashrc. When you use the “rm -i” option the shell will ask you to confirm if you really want to delete files. This is worth doing by setting up the alias in your .bashrc file. It is easy to type “rm” and accidently delete files. Therefore, the “-i” (interactive) option is a life saver. For example,

[campbell@spruce project]$ rm -i socket.c  
rm: remove regular empty file ‘socket.c’? y

Hidden files

In the home directory there are a number of interesting “hidden” files. Using the “-a” lists all files including those that begin with a dot (aka the hidden files).

[campbell@spruce ~]$ ls -al  
total 899  
drwxr-xr-x  21 campbell faculty   1448 Dec 24 14:58 .  
drwxr-xr-x  25 root     root       624 May 31  2007 ..  
-rw-r--r--   1 campbell faculty      0 Dec 23 18:45 .addressbook  
-rw-------   1 campbell faculty   2285 Dec 23 18:45 .addressbook.lu  
drwxr-xr-x   3 campbell faculty     72 Nov  6 22:57 .adobe  
-rw-------   1 campbell faculty   4978 Dec 24 13:39 .bash_history  
-rw-r--r--   1 campbell ug         882 Jun 24  1997 .bash_logout  
-rw-r--r--   1 campbell faculty   1707 Dec 22 18:52 .bash_profile  
-rw-r--r--   1 campbell faculty   1411 Dec 22 19:18 .bashrc  
... (snip)  
-rw-------   1 campbell faculty    864 Dec 23 22:42 .Xauthority  
drwx------   2 campbell faculty    136 Nov  5 21:04 .xemacs

But a simple ls will only show:

[campbell@spruce ~]$ ls  
bin  cs50  lib  mail  papers  public_html  Sent  Sent Messages  teaching

Reading material for the next class

Make sure you do the reading for the next class Typically we have reading for Wednesday and Friday classes.