Shell Programming

Take time for this week’s reading; it’s short, important, and useful.

We are now familiar with the shell and a few commands.

In this lecture, we discuss shell programming using bash. The main goal is to write your own scripts. But what are scripts? And what are they useful for?

Goals

We learn the following today:

Understanding shell script syntax and constructs
Writing simple interactive scripts
Writing and executing your first shell script
Understanding more advanced constructs through examples

We will do this activity in the class.

Interactive mode and shell scripts

The shell can be used in two different ways:

interactive mode, which allows you to enter more than one command interactively to the shell. We have been doing this already.
shell scripts, in which the shell reads a series of commands (or complex programs) from a text file.

The interactive mode is fine for entering a handful of commands but it becomes cumbersome for the user to keep re-entering these commands interactively. It is better to store the commands in a text file called a shell script, or script for short, and execute the script when needed. In this way, the script is preserved so you and other users can use it again.

In addition to calling Unix commands (e.g., grep, cd, rm), shell scripts can also invoke compiled programs (e.g., C programs) and other shell scripts. Shell programming also includes control-flow commands to test conditions (if...then) or to do a task repeatedly (for...do). These control structure commands found in many other languages (such as C, Python) allow the programmer to quickly write fairly sophisticated shell programs to do a number of different tasks.

Like Python, and unlike C or Java, shell scripts are not compiled; rather, they are interpreted and executed by the shell itself.

Shell scripts are used for many reasons - building and configuring systems or environments, prototyping code, or an array of repetitive tasks that programmers do. Shell programming is mainly built on the Unix shell commands and utilities; reuse of these existing programs enables programmers to simply build new programs to tackle fairly complex jobs.

Separating groups of commands using ‘;’

Let’s start to build up our knowledge of how scripts work by first looking at some basic operations of the shell. The Unix shell allows for the unconditional execution of commands and allows for related commands to be kept adjacent as a command sequence using the semicolon character as shown below:

$ echo Directory listing; date; ls
Directory listing
Fri Mar 25 13:33:40 EDT 2022
Archive/  data/  dotfiles/  examples@  private/      tse/
build/    demo/  example/   labs/      public_html/
$

Exit status - who cares?

When using the shell interactively it is often clear when we have made a mistake - the shell warns about incorrect syntax, and complains about invalid switches or missing files. These warnings and complaints can come from the shell’s parser and from the program being run (for example, from ls).

Error messages provide visual clues that something is wrong, allowing us to adjust the command to get it right.

Commands also inform the shell explicitly whether the command has terminated successfully or unsuccessfully due to some error. Commands do this by returning an exit status, which is represented as an integer value made available to the shell and other commands, programs, and scripts.

The shell understands an exit status of 0 to indicate successful execution, and any other value (always positive) to indicate failure of some sort.

The shell environment value $? is updated each time a command exits.

What do we mean by that?

$ echo April Fool
April Fool
$ echo $?
0
$ ls April Fool
ls: cannot access April: No such file or directory
ls: cannot access Fool: No such file or directory
$ echo $?
2
$ 

Conditional sequences - basic constructs

Why do we need to use the exit status?

Often we want to execute a command based on the success or failure of an earlier command. For example, we may only wish to remove files if we are in the correct directory, or perhaps we want to be careful to only append info to a file if we know it already exists.

The shell provides both conjunction (and) and disjunction (or) based on previous commands. These are useful constructs for writing decision-making scripts. Take a look at the example below in which we make three directories, then try to remove the first:

$ mkdir labs && mkdir labs/lab1 labs/labs2
$ rmdir labs || echo whoops!
rmdir: failed to remove `labs': Directory not empty
whoops!
$     

In the first example, && (without any spaces) specifies that the second command should be only executed if the first command succeeds (with an exit status of 0) - i.e., we only make the sub-directories if we can make the top directory.

In the second example, (||) (without any spaces) requests that the second command is only executed if the first command failed (with an exit status other than 0).

Conditional execution using if, then, else

There are many situations when we need to execute commands based on the outcome of an earlier command.

if command0; then
	command1
	command2
fi

Here command1 and command2 (and any other commands that might be input) will be executed if and only if command0 returns a successful or true value (i.e., its exit status is 0).

The fact that 0 means true is confusing for many people! (In many high-level languages - like C - zero means false and non-zero means true; technology isn’t always consistent.) The reason Unix uses 0 for success is that there is only one 0, but there are many non-zero numbers; thus, 0 implies ‘all is well’ whereas non-zero implies ‘something went wrong’, and the specific non-zero value can convey information about what went wrong.

Similarly, we may have commands to execute if the conditional fails.

if command0; then
	command1
	command2
else
	command3
	command4
fi

Here command3 and command4 will be executed if and only if command0 fails.

First Interactive Shell Program

Entering interactive scripts - that is, a tiny sequence of commands, typed at the keyboard in an interactive shell - is an easy way to get the sense of a new scripting language or to try out a set of commands. During an interactive session the shell simply allows you to enter an ‘one-command’ interactive program at the command line and then executes it.

$ if cp students.txt students.bak
> then
> echo "$? copy succeeded!"
> else
> echo "$? copy failed!"
> fi
0 copy succeeded!
$

The > character is the secondary prompt, issued by the shell indicating that more input is expected.

The exit status of the cp command is used by the shell to decide whether to execute the then clause or the else clause. Just for yucks, I had echo show us the exit status $?; the above example confirms that 0 status means ‘true’ and triggered the then clause.

We can invert the conditional test by preceding it with !, as in many programming languages:

$ if ! cp students.txt students.bak
> then
> echo "copy failed!"
> fi
$ 

Astute readers might note that I did not quote or escape the ! in the echo commands. I’ve noticed that the ! is not special if it comes last, which is handy for writing interjections!

The command0 can actually be a sequence or pipeline. The exit status of the last command is used to determine the conditional outcome.

$ if mkdir backup && cp students.txt backup/students
> then
> echo "backup success"
> else
> echo "backup failed"
> fi
backup success
$ 

In the above example, then was on the next line instead of at the end of the if line. That’s a stylistic choice; if you want it on the if line you simply need to put a semicolon (;) after the if condition and before the word then, as seen in the earlier examples.

The test, aka [ ] command

The command0 providing the exit status need not be an external command. We can test for several conditions using the built-in test or (interchangeably) the [ ] command. We use both below but we recommend you use the [ ] version of the test command because (a) it is more readable and (b) it’s more commonly used. Suppose I want to backup students.txt only if it exists; the -f switch tests whether the following filename names an existing file.

$ if test -f students.txt
> then
> mkdir backup && cp students.txt backup/students.txt || echo "backup failed"
> fi
$ 

Rewritten with [ ],

$ if [ -f students.txt ]
> then
> mkdir backup && cp students.txt backup/students.txt || echo "backup failed"
> fi
$ 

More commonly, the if and then are written on the same line, using semicolon:

$ if [ -f students.txt ]; then
> mkdir backup && cp students.txt backup/students.txt || echo "backup failed"
> fi
$ 

Note, it’s important that you leave spaces around the brackets or you will get syntax errors. There are other options that can be used with the [ ] command.

   Option       Meaning
     -e         does the file exist?
     -d         does the directory exist?
     -f         does the file exist and is it an ordinary file (not a directory)?
     -r         does the file exist and is it readable?
     -s         does the file exist and have a size greater than 0 bytes
     -w         does the file exist and is it writeable?
     -x         does the file exist and is it executable?

To learn even more about the test command, man test.

Loops for lists

Many commands accept a list of files on the command line and perform actions on each file in turn. However, what if we need to perform a sequence of commands on each file in the list of files? Some commands can only handle one file (or argument) per invocation so we need to invoke the command many times.

The shell supports a simple iteration over lists of values - typically over lists of filenames. In the following example, we make a ‘back up’ copy of each of our C files by appending the .bak extension. (Again, this extension is just a naming convention - Unix doesn’t care, nor does the shell.)

$ ls
hash.c	hash.c.date  makefile  output.data  queue.c  README  sort.c
$ for i in *.c
> do
> echo "back up $i"
> cp "$i" "$i.bak"
> done
back up hash.c
back up queue.c
back up sort.c
$ ls
hash.c	    hash.c.date  output.data  queue.c.bak  sort.c
hash.c.bak  makefile	 queue.c      README	   sort.c.bak
$ 

Notice that the variable i is instantiated, one at a time, with the value of each argument in the list provided after in, and that value is substituted wherever $i occurs. The double quotes surrounding the variable names can deal with cases where one of the filenames has a space inside it.

As expected we may place as many commands as we want inside the body of a loop. We can use any combination of other if/else tests and nested loops, just like in traditional languages such as C.

We are not limited to file names (as generated by filename expansion) in our list:

$ for house in Allen "East Wheelock" "North Park" School South West LLC
> do
> echo $house is the best house!
> done
Allen is the best house!
East Wheelock is the best house!
North Park is the best house!
School is the best house!
South is the best house!
West is the best house!
LLC is the best house!
$ 

We can use the contents of a file to provide the list used by for:

$ cat LFlist
Jack.A.McMahon.23@dartmouth.edu
Cleo.M.De.Rocco.24@dartmouth.edu
Marvin.Escobar.Barajas.25@dartmouth.edu
Andrea.S.Robang.24@dartmouth.edu
Samuel.R.Barton.25@dartmouth.edu
Rehoboth.K.Okorie.23@dartmouth.edu
$ for i in $(<LFlist); do echo hello "$i"; done
hello Jack.A.McMahon.23@dartmouth.edu
hello Cleo.M.De.Rocco.24@dartmouth.edu
hello Marvin.Escobar.Barajas.25@dartmouth.edu
hello Andrea.S.Robang.24@dartmouth.edu
hello Samuel.R.Barton.25@dartmouth.edu
hello Rehoboth.K.Okorie.23@dartmouth.edu
$ 

Notice the special shell syntax $(<filename), which means to substitute the contents of filename. Any spaces or newlines in the file will cause the shell to delineate words that become arguments to for.

The example also demonstrates how one can use semicolons to write a simple loop all on one line!

In fact, if you type a multi-line if or for statement, then execute it, and later use up-arrow (or ctrl-P) to have the shell retrieve your earlier command, you’ll see that it formats it this way.

We can even use the output of a command to provide the list used by for:

$ for i in $(sed 's/\..*/!/' LFlist | sort); do echo hello $i; done
hello Andrea!
hello Cleo!
hello Jack!
hello Marvin!
hello Rehoboth!
hello Samuel!
$ 

Indeed, in this case, we’ve used a pipeline of two commands to produce the list of arguments to for.

You may see old scripts (or old people!) using the old-fashioned syntax in which the command is surrounded by back-quotes, `command`, instead of $(command); the latter is arguably more readable and, sometimes, nestable.

First Shell Script

Up until now we have entered scripts interactively into the shell. It is a pain to have to keep re-entering scripts interactively. It is better to store the script commands in a text file and then execute the script when we need it. So how do we do that?

Simple! Write the commands in a file, and ask bash to read commands from the file instead of from the keyboard.

For example, we can put our simple backup script into a file called backup.sh:

$ cat > backup.sh
for i in *.c
do
  echo "back up $i"
  cp "$i" "$i.bak"
done
$ bash backup.sh
back up hash.c
back up queue.c
back up sort.c
$ 

Here I’ve typed it at the keyboard, but for more complex scripts, you would of course want to use a text editor.

Indeed, we can go further, and make the file into a command executable at the shell prompt; to do so, you should

add a special string #!/bin/bash to the first line,
make it executable (with chmod), and
type its pathname along with the name of the script at the command line.

So, for backup.sh, it looks like this:

$ cat backup.sh
#!/bin/bash
for i in *.c
do
  echo "back up $i"
  cp "$i" "$i.bak"
done
$ chmod +x backup.sh
$ ls -l backup.sh
-rwxr-xr-x 1 f001cxb thayerusers 60 Jun 28 21:48 backup.sh*
$ ./backup.sh
back up hash.c
back up queue.c
back up sort.c
$

There are a couple of things to note about this example.

First, there is the #!/bin/bash line. What does this mean? Typically, the # in the first column of a file denotes the start of a comment until the end of the line. Indeed, in this case, this line is treated as a comment by bash. Unix, however, reads that line when you execute the file and uses it to determine which command should be fed this file; thus, in effect, Unix will execute /bin/bash ./backup.sh. Then bash reads the file and interprets its commands. The #!/bin/bash must be the first line of the file, exactly like that - no spaces.

Second, there is chmod +x, which sets the ‘execute’ permission on the file. (Notice the ‘x’ characters in the file permissions displayed by ls.) Unix will not execute files that do not have ‘execute’ permission, and the shell won’t even try.

Third, we typed the name of the shell script along with its path ./backup.sh.

Fourth, this script has no comments. We really should improve it; see backup.sh.

#!/bin/bash
#
# backup.sh - make a backup copy of all the .c files in current directory
#
# usage: backup.sh
# (no arguments)
#
# input: none
# output: a line of confirmation for each file backed up
#
# CS50, Spring 2022

for i in *.c
do
  echo "back up $i"
  cp "$i" "$i.bak"
done

exit 0

It is good practice to identify the program, how its command-line should be used, and a description of what it does (if anything) with stdin and stdout. And to list the author name(s) and date.

Notice the script returns the exit status 0, which can be viewed using the echo $? command, as discussed earlier. The return status is typically not checked when scripts are run from the command line. However, when a script is called by another script the return status is typically checked - so it is important to return a meaningful exit status.

We could continue to improve this script - for example, to catch errors from cp and do something intelligent, but let’s move on.

Variables

Variables are typically not declared before they are used in scripts.

$ a=5
$ message="good morning"
$ echo $a
5
$ echo $message
good morning
$ 
$ echo ${message}
good morning

Above we create two variables (a and message). The later commands show the ${varname} syntax for variable substitution; this is the general form whereas $varname is a shorthand that works for simple cases; note that ${message} is identical to $message.

Repetition: the while Command

The ‘for-loop’ construct is good for looping through a series of strings but not that useful when you do not know how many times the loop needs to run. The while do command is perfect for this.

The contents of guessprime.sh use the ‘while-do’ construct. The script allows the user to guess a prime between 1-100.

#!/bin/bash
#
# File: guessprime.sh
# 
# Description: The user tries to guess a prime between 1-100 
# This is not a good program. There is no check on what the
# user enters; it may not be a prime, or might be outside the range.
# Heck - it might not even be a number and might be empty!
# Some defensive programming would check the input.
# 
# Input: The user guess a prime and enters it
#
# Output: Status on the guess

# Program defines a variable called prime and set it to a value.

prime=31

echo -n "Enter a prime between 1-100: "
read guess

while [ $guess != $prime ]; do
    echo "Wrong! try again"
    echo -n "Enter a prime between 1-100: "
    read guess
done
exit 0

This script uses user defined variables prime and guess. It introduces the read command, which pauses and waits for user input, placing that user input into the named variable. The -n switch to echo removes the newline usually produced by echo. Finally, note the semicolon after the while command and before the do command. As with the if command and its then branch, we could have put do on the next line if we prefer that style.

$ ./guessprime.sh 
Guess a prime between 1-100: 33
Wrong! try again
Guess a prime between 1-100: 2
Wrong! try again
Guess a prime between 1-100: 9
Wrong! try again
Guess a prime between 1-100: 31
$ 

The shell’s variables

The shell maintains a number of important variables that are useful in writing scripts. We have come across some of them already.

  Variable            Description
     $USER              username of current user
     $HOME              pathname for the home directory of current user
     $PATH              a list of directories to search for commands
     $#                 number of parameters passed to the script
     $0                 name of the shell script
     $1, $2, .. $#      arguments given to the script
     $*                 A list of all the parameters in a single variable.
     $@                 A list of all the parameters in a single variable; always delimited
     $$                 process ID of the shell script when running

The variable $# tells you how many arguments were on the command line; if there were three arguments, for example, they would be available as $1, $2, and $3. In the command line myscript.sh a b c, then, $#=3, $0=myscript.sh, $1=a, $2=b, and $3=c.

The two variables $* and $@ both provide the list of command-line arguments, but with subtle differences; try the following script, args.sh, to see the difference.

#!/bin/bash

echo $# arguments to $0

# loop through all the arguments, in four different ways
echo 'for arg in $*'
for arg in $*; do echo "$arg"; done

echo
echo 'for arg in "$*"'
for arg in "$*"; do echo "$arg"; done

echo
echo 'for arg in $@'
for arg in $@; do echo "$arg"; done

echo
echo 'for arg in "$@"'
for arg in "$@"; do echo "$arg"; done

exit 0

Let’s try it on a command with four arguments; the fourth argument has an embedded space.

$ ./args.sh one two three "and more"
4 arguments to ./args.sh
for arg in $*
one
two
three
and
more

for arg in "$*"
one two three and more

for arg in $@
one
two
three
and
more

for arg in "$@"
one
two
three
and more
$ 

Study the difference of each case. You should use "$@" to process command-line arguments, nearly always, because it retains the structure of those arguments.

As a shorthand, for arg is equivalent to for arg in "$@".

My choice of the variable name arg is immaterial to the shell.

Printing error messages

You might need to inform the user of an error; in this example, the 2nd argument is supposed to be a directory and the script found that it is not:

echo 1>&2  "Error: $2 should be a directory"

Here we see how to push the output of echo, normally to stdout (1), to the stderr (2) instead, by redirecting the stdout to the stderr using the confusing but useful redirect 1>&2, which means ‘make the stdout go to the same place as the stderr’.

Checking arguments

When writing scripts it is important to write defensive code that checks whether the input arguments are correct. Below, the program verifies that the command has exactly three arguments, using the ‘not equal to’ operator.

if [ $# -ne 3 ]; then
   echo 1>&2 "Usage: incorrect argument input"
   exit 1
fi

Notice also that the script then exits with a non-zero status.

Finally

From this week’s reading assignments:

Comments should clarify the code, not obscure it.
They should enlighten, not impress.
If you used a special algorithm or text, mention it and give a reference!
Don’t just add noise or chitchat.
Say in comments what the code cannot.

Don’t forget there are some good bash references on the Resources page.

Other stuff

There’s never enough time to show you all the good stuff in class.

Simple debugging tips

When you run a script you can use printf or echo to print debugging information to the screen. I found it helpful to define a function debugPrint so I can turn on and off all my debug statements in one place:

# print the arguments for debugging; comment-out 'echo' line to turn it off.
function debugPrint() {
#    echo "$@"
    return
}
...
debugPrint starting to process arguments...
for arg; do
	debugPrint processing "$arg"
	...

Arrays

Like variables, arrays are typically not declared before they are used in scripts.

$ colors=(red orange yellow green blue indigo violet)
$ echo $colors
red
$ echo ${colors[1]}
orange
$ echo ${colors[6]}
violet
$ echo ${colors[7]}
$ 

Above we create one array (colors). Notice that $colors implicitly substitutes the first element, with index 0 (computer scientists like counting from zero). The later commands show the ${varname} syntax for variable substitution; this is the general form whereas $varname is a shorthand that works for simple cases; note that ${message} is identical to $message and $colors is equivalent to ${colors[0]}. When desiring to subscript an array variable, you must use the full syntax, as in ${colors[1]}. Finally, note that ${colors[7]} is empty because it was not defined.

Even cooler, the array can be used in combination with file substitution $(<filename) and command substitution $(command):

$ cat LFlist
Jack.A.McMahon.23@dartmouth.edu
Cleo.M.De.Rocco.24@dartmouth.edu
Marvin.Escobar.Barajas.25@dartmouth.edu
Andrea.S.Robang.24@dartmouth.edu
Samuel.R.Barton.25@dartmouth.edu
Rehoboth.K.Okorie.23@dartmouth.edu
$ lfs=($(<LFlist))
$ echo ${lfs[2]}
Marvin.Escobar.Barajas.25@dartmouth.edu
$ sophomores=($(grep .24. LFlist))
$ echo ${sophomores[0]}
Cleo.M.De.Rocco.24@dartmouth.edu
$ echo ${lfs[*]}
Jack.A.McMahon.23@dartmouth.edu Cleo.M.De.Rocco.24@dartmouth.edu Marvin.Escobar.Barajas.25@dartmouth.edu Andrea.S.Robang.24@dartmouth.edu Samuel.R.Barton.25@dartmouth.edu Rehoboth.K.Okorie.23@dartmouth.edu
$ 

The last line demonstrates how you can substitute all values of the array, with the [*] index.

Arithmetic

The let command carries out arithmetic operations on variables.

$ let a=1
$ let b=2
$ let c = a + b
-bash: let: =: syntax error: operand expected (error token is "=")

# ... note, the let command is sensitive to spaces.

$ let c=a+b
$ echo $c
3
$ let a*=10  # equivalent to  let a=a*10
$ echo $a
10

Another way to do arithmetic in shell is to use double parentheses to include the arithmetic equation. You can also use $ to get the result of an arithmetic operation and assign it to another variable.

$ a=1
$ b=1
$ ((c=a+b))
$ echo $c
2
$ d=$((c*2))
$ echo $d
4

Functions

Like most procedural languages, shell scripts have structure and function support. Typically, it is a good idea to use functions to make scripts more readable and structured. In what follows, we simply add a function to guessprime to create guessprimefunction.sh:

#!/bin/bash
#
# File: guessprimefunction.sh (variant of guessprime.sh)
# 
# Description: The user tries to guess a prime between 1-100 
# This is not a good program. There is no check on what the
# user enters; it may not be a prime, or might be outside the range.
# Heck - it might not even be a number and might be empty!
# Some defensive programming would check the input.
# 
# Input: The user guess a prime and enters it
#
# Output: Status on the guess

# Ask the user to guess, and fill global variable $guess with result.
# usage: askguess low high
#   where [low, high] is the range of numbers in which they should guess.
function askguess() {
    echo -n "Enter a prime between $1-$2: "
    read guess
}

# Program defines a variable called prime and set it to a value.

prime=31

# ask them once
askguess 1 100

while [ $guess != $prime ]; do
    # ask again
    askguess 1 100
done
exit 0

Notice that defining a function effectively adds a new command to the shell, in this case, askguess. And that command can have arguments! And those arguments are available within the function as if they were command-line arguments $1, $2, and so forth. All other variables are treated as ‘global’ variables, like guess in this example.

Try this script; it’s very fragile. See what happens when you enter nothing - just hit return at the prompt for a guess. Why does that happen?