Lab 1 - Bash
Due Sunday, April 3 at 10pm
This first lab should get you up to speed working with the command line, basic shell commands, an editor, and a small bash program.
Preparation
Log in to the Thayer plank server (plank.thayer.dartmouth.edu
) with your NetID, set up for your work in this course, if you have not already:
[MacBook ~]$ ssh cs50
[plank ~]$ mkdir -p cs50/labs/
[plank ~]$ cd cs50/labs
These commands create a directory ~/cs50/labs
, prevent others from peeking at your work, and change the working directory to labs
so you’re ready for the work below.
Clone the starter kit: visit GitHub Classroom, accept the assignment, and clone the repository to your labs directory. It will look something like this, assuming your GitHub username is XXXXX:
$ git clone git@github.com:cs50-spring-2022/lab1-XXXXX.git
Cloning into 'lab1-XXXXX'...
The clone step will create a new directory ~/cs50/labs/lab1-XXXXX
,
If you would prefer to work out the initial solutions on your laptop, repeat the above git clone command on your local laptop (without logging to Thayer servers via ssh). Later, use
scp
to transfer your solutions to your Linux account, or push it to your git repo and rungit pull
on Thayer servers to get the latest version, and test them there.
Assignment
First download a spreadsheet from
https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-amqh/data
and save it as vaccine.csv
. You can also use the following command to do both in one step:
wget -O vaccine.csv https://data.cdc.gov/api/views/8xkx-amqh/rows.csv?accessType=DOWNLOAD
Here wget
command is fetching the file at a given URL. -O
option (with character o in uppercase) specifies the file name to save as.
If you would like to access the file on Thayer servers, we have downloaded a copy in our shared workspace. Once you logged into plank server, you can see it listed as one of the files:
$ ls /thayerfs/courses/22spring/cosc050/workspace/
backup.sh* passwd students.txt vaccine.csv
Please do not make copies of the vaccine.csv file in your own home directory on Thayer servers. It is a very large (281MB) file and can cause quota issues for your Linux account.
vaccine.csv
is a comma-separated value (CSV) file released by Centers for Disease Control and Prevention (CDC). It provides COVID-19 vaccine administration data at county level and is updated daily. More description of the dataset can be found at this link.
A. Write a single bash command or pipeline to print only the lines for the New Hampshire state in the month of February, 2022. The output should not contain the current first line, which lists the names of data fields.
B. Write a single bash command or pipeline to print only the county (Recip_County), state (Recip_State), and percentage of fully vaccinated people (Series_Complete_Pop_Pct) columns, separated by commas. The output should not contain the current first line, which lists the names of data fields.
C. Write a single bash command or pipeline to print only the lines from Feb. 15, 2022 to Feb. 17, 2022 (including the data on Feb. 15).
D. Write a single bash command or pipeline to print the counties with at least 90% of fully vaccinated population so far in the state of California.
E. Write a single bash command or pipeline to print the number of counties with at least 80% of fully vaccinated population so far in each state, in decreasing order of the number of counties. Each line of output should contain the number of counties with at least 80% of fully vaccinated population and the state name.
F. Write a single bash command or pipeline to print the counties with the top-10 highest percentage of fully vaccinated population based on the latest data, in decreasing order of fully vaccinated percentage. Each line of output should contain the county name, the state, and fully vaccinated percentage, separated by a comma.
G. Extend that command line to edit each output line, adding a pipe (|
) symbol at the beginning and the end, and replacing the comma(s) with a pipe symbol. If you copy-paste that output into a Markdown file and prepend the first two lines (which do not need to be generated from your command), it is turned into a nice table, like this one (based on the data set updated on Mar. 28, 2022):
County | State | Fully-Vaccinated Percentage |
---|---|---|
Webb County | TX | 95 |
Santa Cruz County | AZ | 95 |
Presidio County | TX | 95 |
Irion County | TX | 95 |
Culebra Municipio | PR | 95 |
Chattahoochee County | GA | 95 |
Bristol Bay Borough | AK | 95 |
Apache County | AZ | 95 |
Starr County | TX | 94.4 |
San Juan County | CO | 94.2 |
You do not have to edit the output of your command line - you would just add the header row. Read about Markdown, and about Markdown tables.
H. Write a bash script called query.sh
that takes the name of a state and outputs the number of fully vaccinated people (Column ‘Series_Complete_Yes’) for this state based on the latest cumulative data. It can also take date as an additional parameter, in which case it will output the number of fully vaccinated people on that date for the specified state. Here are some example outputs by running the script on Mar. 28, 2022:
$ ./query.sh
Incorrect number of arguments. Usage: ./query.sh state [date]
$ ./query.sh Hanover
Hanover state does not exist
$ ./query.sh CA 2052x-xew
Date 2052x-xew does not exist
$ ./query.sh NH
NH: 927557
$ ./query.sh CA
CA: 28081845
$ ./query.sh NH 03/25/2022
NH: 925430
Hint: similar to question D, E, and F, we need to think about how to get the latest date.
- Your script should print an error and exit non-zero if the number of arguments is less than 1 or greater than 2.
- Your script should print an error and exit non-zero if
vaccine.csv
is not an existing, readable file (you script should directly access/thayerfs/courses/22spring/cosc050/workspace/vaccine.csv
file if it runs on Thayer servers). - Your script should print an error and exit non-zero if it does not find the state specified by the first parameter.
- Your script should print an error and exit non-zero if it does not find the date specified by the second parameter.
- Your script should exit with zero status, otherwise.
- Your script should have a brief header comment giving the script name, your name, the date, and a short summary of how someone can/should use the script.
What to hand in, and how
You should have three files in your lab1-XXXX
directory:
-
edit
README.md
to remove instructions, add your name, add your username. -
create
solution.md
with the answers to items A-G; for each, include a subsection header and show the command line. (Do not include the command output.) This is a “Markdown” file and you should use Markdown formatting. Notably, use code blocks to format the commands, like those you see below. You can preview it with various Markdown-rendering tools (see: Markdown resources) but we will read it on GitHub.com, so make sure it looks good there. -
write
query.sh
with the script for item H.
You should add only these three files to your repo:
git add README.md solution.md query.sh
Please do not add vaccine.csv; it is large and, of course, we can download our own copy.
Commit your changes:
git commit -m "your commit message"
Push your changes to GitHub:
git push
Actually, if it is your first push, it will remind you to
git push --set-upstream origin master
Make sure you left nothing unexpected behind:
git status
If you need to make updates, repeat the add
, commit
, push
sequence.
You can verify that it seems safely uploaded by visiting GitHub.
If you need to submit after the deadline …
Your commit message should say “PLEASE GRADE THIS COMMIT.” Our graders will grade the last commit made before the deadline, unless they see that message on a late commit; they will grade the latest such commit that is less than 72h after the deadline. Late commits without such a comment will be ignored.
Hints
You will find some of the following commands useful; use man cmd
to read about any command.
It’s best to run man
inside Linux so you are sure to get the manual for the Linux version of the command (MacOS can differ).
less
cut
head
tail
grep
(note-n
)wget
sort
uniq
tr
sed
wc
(note-l
)
grep
and sed
depend on regular expressions.
It is helpful to remember that ^
anchors a pattern to the start of a line and $
anchors to the end of the line.
Most Unix tools work line-by-line. For some problem(s) I found it helpful to translate the csv header line into a sequence of lines, on which I could operate with other tools.