Shell Programming
Take time for this week’s reading; it’s short, important, and useful.
We are now familiar with the shell and a few commands.
In this lecture, we discuss shell programming using bash. The main goal is to write your own scripts. But what are scripts? And what are they useful for?
Goals
We learn the following today:
- Understanding shell script syntax and constructs
- Writing simple interactive scripts
- Writing and executing your first shell script
- Understanding more advanced constructs through examples
We will do this activity in the class.
Interactive mode and shell scripts
The shell can be used in two different ways:
- interactive mode, which allows you to enter more than one command interactively to the shell. We have been doing this already.
- shell scripts, in which the shell reads a series of commands (or complex programs) from a text file.
The interactive mode is fine for entering a handful of commands but it becomes cumbersome for the user to keep re-entering these commands interactively. It is better to store the commands in a text file called a shell script, or script for short, and execute the script when needed. In this way, the script is preserved so you and other users can use it again.
In addition to calling Unix commands (e.g., grep
, cd
, rm
), shell scripts can also invoke compiled programs (e.g., C programs) and other shell scripts.
Shell programming also includes control-flow commands to test conditions (if...then
) or to do a task repeatedly (for...do
).
These control structure commands found in many other languages (such as C, Python) allow the programmer to quickly write fairly sophisticated shell programs to do a number of different tasks.
Like Python, and unlike C or Java, shell scripts are not compiled; rather, they are interpreted and executed by the shell itself.
Shell scripts are used for many reasons - building and configuring systems or environments, prototyping code, or an array of repetitive tasks that programmers do. Shell programming is mainly built on the Unix shell commands and utilities; reuse of these existing programs enables programmers to simply build new programs to tackle fairly complex jobs.
Separating groups of commands using ‘;’
Let’s start to build up our knowledge of how scripts work by first looking at some basic operations of the shell. The Unix shell allows for the unconditional execution of commands and allows for related commands to be kept adjacent as a command sequence using the semicolon character as shown below:
$ echo Directory listing; date; ls
Directory listing
Fri Mar 25 13:33:40 EDT 2022
Archive/ data/ dotfiles/ examples@ private/ tse/
build/ demo/ example/ labs/ public_html/
$
Exit status - who cares?
When using the shell interactively it is often clear when we have made a mistake - the shell warns about incorrect syntax, and complains about invalid switches or missing files.
These warnings and complaints can come from the shell’s parser and from the program being run (for example, from ls
).
Error messages provide visual clues that something is wrong, allowing us to adjust the command to get it right.
Commands also inform the shell explicitly whether the command has terminated successfully or unsuccessfully due to some error. Commands do this by returning an exit status, which is represented as an integer value made available to the shell and other commands, programs, and scripts.
The shell understands an exit status of 0
to indicate successful execution, and any other value (always positive) to indicate failure of some sort.
The shell environment value $?
is updated each time a command exits.
What do we mean by that?
$ echo April Fool
April Fool
$ echo $?
0
$ ls April Fool
ls: cannot access April: No such file or directory
ls: cannot access Fool: No such file or directory
$ echo $?
2
$
Conditional sequences - basic constructs
Why do we need to use the exit status?
Often we want to execute a command based on the success or failure of an earlier command. For example, we may only wish to remove files if we are in the correct directory, or perhaps we want to be careful to only append info to a file if we know it already exists.
The shell provides both conjunction (and) and disjunction (or) based on previous commands. These are useful constructs for writing decision-making scripts. Take a look at the example below in which we make three directories, then try to remove the first:
$ mkdir labs && mkdir labs/lab1 labs/labs2
$ rmdir labs || echo whoops!
rmdir: failed to remove `labs': Directory not empty
whoops!
$
In the first example, &&
(without any spaces) specifies that the second command should be only executed if the first command succeeds (with an exit status of 0
) - i.e., we only make the sub-directories if we can make the top directory.
In the second example, (||
) (without any spaces) requests that the second command is only executed if the first command failed (with an exit status other than 0
).
Conditional execution using if, then, else
There are many situations when we need to execute commands based on the outcome of an earlier command.
if command0; then
command1
command2
fi
Here command1
and command2
(and any other commands that might be input) will be executed if and only if command0
returns a successful or true value (i.e., its exit status is 0
).
The fact that
0
means true is confusing for many people! (In many high-level languages - like C - zero means false and non-zero means true; technology isn’t always consistent.) The reason Unix uses0
for success is that there is only one0
, but there are many non-zero numbers; thus,0
implies ‘all is well’ whereas non-zero implies ‘something went wrong’, and the specific non-zero value can convey information about what went wrong.
Similarly, we may have commands to execute if the conditional fails.
if command0; then
command1
command2
else
command3
command4
fi
Here command3
and command4
will be executed if and only if command0
fails.
First Interactive Shell Program
Entering interactive scripts - that is, a tiny sequence of commands, typed at the keyboard in an interactive shell - is an easy way to get the sense of a new scripting language or to try out a set of commands. During an interactive session the shell simply allows you to enter an ‘one-command’ interactive program at the command line and then executes it.
$ if cp students.txt students.bak
> then
> echo "$? copy succeeded!"
> else
> echo "$? copy failed!"
> fi
0 copy succeeded!
$
The >
character is the secondary prompt, issued by the shell indicating that more input is expected.
The exit status of the cp
command is used by the shell to decide whether to execute the then
clause or the else
clause.
Just for yucks, I had echo
show us the exit status $?
; the above example confirms that 0
status means ‘true’ and triggered the then
clause.
We can invert the conditional test by preceding it with !
, as in many programming languages:
$ if ! cp students.txt students.bak
> then
> echo "copy failed!"
> fi
$
Astute readers might note that I did not quote or escape the
!
in the echo commands. I’ve noticed that the!
is not special if it comes last, which is handy for writing interjections!
The command0
can actually be a sequence or pipeline.
The exit status of the last command is used to determine the conditional outcome.
$ if mkdir backup && cp students.txt backup/students
> then
> echo "backup success"
> else
> echo "backup failed"
> fi
backup success
$
In the above example, then
was on the next line instead of at the end of the if
line.
That’s a stylistic choice; if you want it on the if
line you simply need to put a semicolon (;
) after the if
condition and before the word then
, as seen in the earlier examples.
The test, aka [ ] command
The command0
providing the exit status need not be an external command.
We can test for several conditions using the built-in test
or (interchangeably) the [ ]
command.
We use both below but we recommend you use the [ ]
version of the test command because (a) it is more readable and (b) it’s more commonly used.
Suppose I want to backup students.txt
only if it exists; the -f
switch tests whether the following filename names an existing file.
$ if test -f students.txt
> then
> mkdir backup && cp students.txt backup/students.txt || echo "backup failed"
> fi
$
Rewritten with [ ]
,
$ if [ -f students.txt ]
> then
> mkdir backup && cp students.txt backup/students.txt || echo "backup failed"
> fi
$
More commonly, the if
and then
are written on the same line, using semicolon:
$ if [ -f students.txt ]; then
> mkdir backup && cp students.txt backup/students.txt || echo "backup failed"
> fi
$
Note, it’s important that you leave spaces around the brackets or you will get syntax errors.
There are other options that can be used with the [ ]
command.
Option Meaning
-e does the file exist?
-d does the directory exist?
-f does the file exist and is it an ordinary file (not a directory)?
-r does the file exist and is it readable?
-s does the file exist and have a size greater than 0 bytes
-w does the file exist and is it writeable?
-x does the file exist and is it executable?
To learn even more about the test
command, man test
.
Loops for lists
Many commands accept a list of files on the command line and perform actions on each file in turn. However, what if we need to perform a sequence of commands on each file in the list of files? Some commands can only handle one file (or argument) per invocation so we need to invoke the command many times.
The shell supports a simple iteration over lists of values - typically over lists of filenames.
In the following example, we make a ‘back up’ copy of each of our C files by appending the .bak
extension.
(Again, this extension is just a naming convention - Unix doesn’t care, nor does the shell.)
$ ls
hash.c hash.c.date makefile output.data queue.c README sort.c
$ for i in *.c
> do
> echo "back up $i"
> cp "$i" "$i.bak"
> done
back up hash.c
back up queue.c
back up sort.c
$ ls
hash.c hash.c.date output.data queue.c.bak sort.c
hash.c.bak makefile queue.c README sort.c.bak
$
Notice that the variable i
is instantiated, one at a time, with the value of each argument in the list provided after in
, and that value is substituted wherever $i
occurs. The double quotes surrounding the variable names can deal with cases where one of the filenames has a space inside it.
As expected we may place as many commands as we want inside the body of a loop. We can use any combination of other if/else tests and nested loops, just like in traditional languages such as C.
We are not limited to file names (as generated by filename expansion) in our list:
$ for house in Allen "East Wheelock" "North Park" School South West LLC
> do
> echo $house is the best house!
> done
Allen is the best house!
East Wheelock is the best house!
North Park is the best house!
School is the best house!
South is the best house!
West is the best house!
LLC is the best house!
$
We can use the contents of a file to provide the list used by for
:
$ cat LFlist
Jack.A.McMahon.23@dartmouth.edu
Cleo.M.De.Rocco.24@dartmouth.edu
Marvin.Escobar.Barajas.25@dartmouth.edu
Andrea.S.Robang.24@dartmouth.edu
Samuel.R.Barton.25@dartmouth.edu
Rehoboth.K.Okorie.23@dartmouth.edu
$ for i in $(<LFlist); do echo hello "$i"; done
hello Jack.A.McMahon.23@dartmouth.edu
hello Cleo.M.De.Rocco.24@dartmouth.edu
hello Marvin.Escobar.Barajas.25@dartmouth.edu
hello Andrea.S.Robang.24@dartmouth.edu
hello Samuel.R.Barton.25@dartmouth.edu
hello Rehoboth.K.Okorie.23@dartmouth.edu
$
Notice the special shell syntax $(<filename)
, which means to substitute the contents of filename
.
Any spaces or newlines in the file will cause the shell to delineate words that become arguments to for
.
The example also demonstrates how one can use semicolons to write a simple loop all on one line!
In fact, if you type a multi-line
if
orfor
statement, then execute it, and later use up-arrow (or ctrl-P) to have the shell retrieve your earlier command, you’ll see that it formats it this way.
We can even use the output of a command to provide the list used by for
:
$ for i in $(sed 's/\..*/!/' LFlist | sort); do echo hello $i; done
hello Andrea!
hello Cleo!
hello Jack!
hello Marvin!
hello Rehoboth!
hello Samuel!
$
Indeed, in this case, we’ve used a pipeline of two commands to produce the list of arguments to for
.
You may see old scripts (or old people!) using the old-fashioned syntax in which the command is surrounded by back-quotes,
`command`
, instead of$(command)
; the latter is arguably more readable and, sometimes, nestable.
First Shell Script
Up until now we have entered scripts interactively into the shell. It is a pain to have to keep re-entering scripts interactively. It is better to store the script commands in a text file and then execute the script when we need it. So how do we do that?
Simple! Write the commands in a file, and ask bash
to read commands from the file instead of from the keyboard.
For example, we can put our simple backup script into a file called backup.sh
:
$ cat > backup.sh
for i in *.c
do
echo "back up $i"
cp "$i" "$i.bak"
done
$ bash backup.sh
back up hash.c
back up queue.c
back up sort.c
$
Here I’ve typed it at the keyboard, but for more complex scripts, you would of course want to use a text editor.
Indeed, we can go further, and make the file into a command executable at the shell prompt; to do so, you should
- add a special string
#!/bin/bash
to the first line, - make it executable (with
chmod
), and - type its pathname along with the name of the script at the command line.
So, for backup.sh
, it looks like this:
$ cat backup.sh
#!/bin/bash
for i in *.c
do
echo "back up $i"
cp "$i" "$i.bak"
done
$ chmod +x backup.sh
$ ls -l backup.sh
-rwxr-xr-x 1 f001cxb thayerusers 60 Jun 28 21:48 backup.sh*
$ ./backup.sh
back up hash.c
back up queue.c
back up sort.c
$
There are a couple of things to note about this example.
First, there is the #!/bin/bash
line.
What does this mean?
Typically, the #
in the first column of a file denotes the start of a comment until the end of the line.
Indeed, in this case, this line is treated as a comment by bash
.
Unix, however, reads that line when you execute the file and uses it to determine which command should be fed this file; thus, in effect, Unix will execute /bin/bash ./backup.sh
.
Then bash
reads the file and interprets its commands.
The #!/bin/bash
must be the first line of the file, exactly like that - no spaces.
Second, there is chmod +x
, which sets the ‘execute’ permission on the file.
(Notice the ‘x’ characters in the file permissions displayed by ls
.) Unix will not execute files that do not have ‘execute’ permission, and the shell won’t even try.
Third, we typed the name of the shell script along with its path ./backup.sh
.
Fourth, this script has no comments. We really should improve it; see backup.sh.
#!/bin/bash
#
# backup.sh - make a backup copy of all the .c files in current directory
#
# usage: backup.sh
# (no arguments)
#
# input: none
# output: a line of confirmation for each file backed up
#
# CS50, Spring 2022
for i in *.c
do
echo "back up $i"
cp "$i" "$i.bak"
done
exit 0
It is good practice to identify the program, how its command-line should be used, and a description of what it does (if anything) with stdin and stdout. And to list the author name(s) and date.
Notice the script returns the exit status 0
, which can be viewed using the echo $?
command, as discussed earlier.
The return status is typically not checked when scripts are run from the command line.
However, when a script is called by another script the return status is typically checked - so it is important to return a meaningful exit status.
We could continue to improve this script - for example, to catch errors from cp
and do something intelligent, but let’s move on.
Variables
Variables are typically not declared before they are used in scripts.
$ a=5
$ message="good morning"
$ echo $a
5
$ echo $message
good morning
$
$ echo ${message}
good morning
Above we create two variables (a
and message
).
The later commands show the ${varname}
syntax for variable substitution; this is the general form whereas $varname
is a shorthand that works for simple cases; note that ${message}
is identical to $message
.
Repetition: the while Command
The ‘for-loop’ construct is good for looping through a series of strings but not that useful when you do not know how many times the loop needs to run.
The while do
command is perfect for this.
The contents of guessprime.sh use the ‘while-do’ construct. The script allows the user to guess a prime between 1-100.
#!/bin/bash
#
# File: guessprime.sh
#
# Description: The user tries to guess a prime between 1-100
# This is not a good program. There is no check on what the
# user enters; it may not be a prime, or might be outside the range.
# Heck - it might not even be a number and might be empty!
# Some defensive programming would check the input.
#
# Input: The user guess a prime and enters it
#
# Output: Status on the guess
# Program defines a variable called prime and set it to a value.
prime=31
echo -n "Enter a prime between 1-100: "
read guess
while [ $guess != $prime ]; do
echo "Wrong! try again"
echo -n "Enter a prime between 1-100: "
read guess
done
exit 0
This script uses user defined variables prime
and guess
.
It introduces the read
command, which pauses and waits for user input, placing that user input into the named variable.
The -n
switch to echo
removes the newline usually produced by echo.
Finally, note the semicolon after the while
command and before the do
command.
As with the if
command and its then
branch, we could have put do
on the next line if we prefer that style.
$ ./guessprime.sh
Guess a prime between 1-100: 33
Wrong! try again
Guess a prime between 1-100: 2
Wrong! try again
Guess a prime between 1-100: 9
Wrong! try again
Guess a prime between 1-100: 31
$
The shell’s variables
The shell maintains a number of important variables that are useful in writing scripts. We have come across some of them already.
Variable Description
$USER username of current user
$HOME pathname for the home directory of current user
$PATH a list of directories to search for commands
$# number of parameters passed to the script
$0 name of the shell script
$1, $2, .. $# arguments given to the script
$* A list of all the parameters in a single variable.
$@ A list of all the parameters in a single variable; always delimited
$$ process ID of the shell script when running
The variable $#
tells you how many arguments were on the command line; if there were three arguments, for example, they would be available as $1
, $2
, and $3
.
In the command line myscript.sh a b c
, then, $#=3
, $0=myscript.sh
, $1=a
, $2=b
, and $3=c
.
The two variables $*
and $@
both provide the list of command-line arguments, but with subtle differences; try the following script, args.sh, to see the difference.
#!/bin/bash
echo $# arguments to $0
# loop through all the arguments, in four different ways
echo 'for arg in $*'
for arg in $*; do echo "$arg"; done
echo
echo 'for arg in "$*"'
for arg in "$*"; do echo "$arg"; done
echo
echo 'for arg in $@'
for arg in $@; do echo "$arg"; done
echo
echo 'for arg in "$@"'
for arg in "$@"; do echo "$arg"; done
exit 0
Let’s try it on a command with four arguments; the fourth argument has an embedded space.
$ ./args.sh one two three "and more"
4 arguments to ./args.sh
for arg in $*
one
two
three
and
more
for arg in "$*"
one two three and more
for arg in $@
one
two
three
and
more
for arg in "$@"
one
two
three
and more
$
Study the difference of each case.
You should use "$@"
to process command-line arguments, nearly always, because it retains the structure of those arguments.
As a shorthand, for arg
is equivalent to for arg in "$@"
.
My choice of the variable name
arg
is immaterial to the shell.
Printing error messages
You might need to inform the user of an error; in this example, the 2nd argument is supposed to be a directory and the script found that it is not:
echo 1>&2 "Error: $2 should be a directory"
Here we see how to push the output of echo
, normally to stdout (1
), to the stderr (2
) instead, by redirecting the stdout to the stderr using the confusing but useful redirect 1>&2
, which means ‘make the stdout go to the same place as the stderr’.
Checking arguments
When writing scripts it is important to write defensive code that checks whether the input arguments are correct. Below, the program verifies that the command has exactly three arguments, using the ‘not equal to’ operator.
if [ $# -ne 3 ]; then
echo 1>&2 "Usage: incorrect argument input"
exit 1
fi
Notice also that the script then exits with a non-zero status.
Finally
From this week’s reading assignments:
- Comments should clarify the code, not obscure it.
- They should enlighten, not impress.
- If you used a special algorithm or text, mention it and give a reference!
- Don’t just add noise or chitchat.
- Say in comments what the code cannot.
Don’t forget there are some good bash
references on the Resources page.
Other stuff
There’s never enough time to show you all the good stuff in class.
Simple debugging tips
When you run a script you can use printf
or echo
to print debugging information to the screen.
I found it helpful to define a function debugPrint
so I can turn on and off all my debug statements in one place:
# print the arguments for debugging; comment-out 'echo' line to turn it off.
function debugPrint() {
# echo "$@"
return
}
...
debugPrint starting to process arguments...
for arg; do
debugPrint processing "$arg"
...
Arrays
Like variables, arrays are typically not declared before they are used in scripts.
$ colors=(red orange yellow green blue indigo violet)
$ echo $colors
red
$ echo ${colors[1]}
orange
$ echo ${colors[6]}
violet
$ echo ${colors[7]}
$
Above we create one array (colors
).
Notice that $colors
implicitly substitutes the first element, with index 0 (computer scientists like counting from zero).
The later commands show the ${varname}
syntax for variable substitution; this is the general form whereas $varname
is a shorthand that works for simple cases; note that ${message}
is identical to $message
and $colors
is equivalent to ${colors[0]}
.
When desiring to subscript an array variable, you must use the full syntax, as in ${colors[1]}
.
Finally, note that ${colors[7]}
is empty because it was not defined.
Even cooler, the array can be used in combination with file substitution $(<filename)
and command substitution $(command)
:
$ cat LFlist
Jack.A.McMahon.23@dartmouth.edu
Cleo.M.De.Rocco.24@dartmouth.edu
Marvin.Escobar.Barajas.25@dartmouth.edu
Andrea.S.Robang.24@dartmouth.edu
Samuel.R.Barton.25@dartmouth.edu
Rehoboth.K.Okorie.23@dartmouth.edu
$ lfs=($(<LFlist))
$ echo ${lfs[2]}
Marvin.Escobar.Barajas.25@dartmouth.edu
$ sophomores=($(grep .24. LFlist))
$ echo ${sophomores[0]}
Cleo.M.De.Rocco.24@dartmouth.edu
$ echo ${lfs[*]}
Jack.A.McMahon.23@dartmouth.edu Cleo.M.De.Rocco.24@dartmouth.edu Marvin.Escobar.Barajas.25@dartmouth.edu Andrea.S.Robang.24@dartmouth.edu Samuel.R.Barton.25@dartmouth.edu Rehoboth.K.Okorie.23@dartmouth.edu
$
The last line demonstrates how you can substitute all values of the array, with the [*]
index.
Arithmetic
The let
command carries out arithmetic operations on variables.
$ let a=1
$ let b=2
$ let c = a + b
-bash: let: =: syntax error: operand expected (error token is "=")
# ... note, the let command is sensitive to spaces.
$ let c=a+b
$ echo $c
3
$ let a*=10 # equivalent to let a=a*10
$ echo $a
10
Another way to do arithmetic in shell is to use double parentheses to include the arithmetic equation. You can also use $
to get the result of an arithmetic operation and assign it to another variable.
$ a=1
$ b=1
$ ((c=a+b))
$ echo $c
2
$ d=$((c*2))
$ echo $d
4
Functions
Like most procedural languages, shell scripts have structure and function support. Typically, it is a good idea to use functions to make scripts more readable and structured. In what follows, we simply add a function to guessprime to create guessprimefunction.sh:
#!/bin/bash
#
# File: guessprimefunction.sh (variant of guessprime.sh)
#
# Description: The user tries to guess a prime between 1-100
# This is not a good program. There is no check on what the
# user enters; it may not be a prime, or might be outside the range.
# Heck - it might not even be a number and might be empty!
# Some defensive programming would check the input.
#
# Input: The user guess a prime and enters it
#
# Output: Status on the guess
# Ask the user to guess, and fill global variable $guess with result.
# usage: askguess low high
# where [low, high] is the range of numbers in which they should guess.
function askguess() {
echo -n "Enter a prime between $1-$2: "
read guess
}
# Program defines a variable called prime and set it to a value.
prime=31
# ask them once
askguess 1 100
while [ $guess != $prime ]; do
# ask again
askguess 1 100
done
exit 0
Notice that defining a function effectively adds a new command to the shell, in this case, askguess
.
And that command can have arguments!
And those arguments are available within the function as if they were command-line arguments $1
, $2
, and so forth.
All other variables are treated as ‘global’ variables, like guess
in this example.
Try this script; it’s very fragile. See what happens when you enter nothing - just hit return at the prompt for a guess. Why does that happen?