CS 33 (Spring 2009): Homework 1: Working with relational algebra
Deadline: 10:00pm sharp on April 13, 2009
Please read through this page entirely and carefully, before
beginning your work.
Part 1: Purely mathematical problems
- Solve problems 2.5 (a) and (b) from the textbook, using only
the fundamental relational algebra operations.
[10 points]
- Solve problems 2.5 (c), (d) and (e) from the textbook. In
addition to the fundamental relational algebra operations, you may
use the natural join and division operations if you like.
[15 points]
- Solve problem 2.6 (a), (b) and (c) from the textbook. You may
use any of the relational algebra operations described in Chapter 2.
[15 points]
Part 2: Programming problems
- The file ~cs33/data/baseball_salaries_2003.txt contains
salary information for certain professional baseball players from
the year 2003. It should be self-explanatory. Write code to read
this file, process it appropriately and determine, for each
position, the average salary of the players in that position. Your
code should write seven lines of text to standard output, with each
line formatted (in printf notation) as "%8d: %s", where the
first parameter (the %8d) is the average salary rounded to
the nearest integer and the second parameter (the %s) is
the name of the position, exactly as in the input file. The lines
should be sorted in descending order of average salary. Note that
the seven positions represented in the input file are "Catcher",
"First Baseman", "Outfielder", "Pitcher", "Second Baseman",
"Shortstop" and "Third Baseman".
[15 points]
- The file ~cs33/data/baseball_salaries_2005.txt is very
similar to the above and contains data for 2005, but for a different
set of teams. However, there are a number of players who appear in
both data sets. Write code to read the two input files and determine
the set of players that appear in both files, along with their 2003
and 2005 salaries. Your code should write one line of text to
standard output for each such player, formatted as "%-21s: %8d:
%8d", where the three parameters (in order) are the player's
name (exactly as in the input files, in quotes), the 2003 salary and
the 2005 salary.
[20 points]
- Examine the following three files, which contain data about a
set of imaginary actors and a set of imaginary films (movies) along
with information about which actors appear in which films:
~cs33/data/actor.txt,
~cs33/data/film.txt,
~cs33/data/actor_film.txt
The comment lines on top of each file explain the contents
adequately. Write code to process these three files and determine
the actor(s) that appear in the maximum number of films. There may
be more than one actor tied for the maximum. Your code should write
several lines to standard output as follows. For each actor tied for
the maximum, write the actor's first name and last name on one line,
separated by a single space. In the next few lines, write the name
of each film in which this actor appears, one film per line: the
film names should be exactly as in the input file, complete with
single quotes. Then go on to the next actor, write their name,
followed by the names of the films they appear in, and so on. The
actors should appear sorted in increasing order of actor IDs and the
list of films appearing beneath each actor's name should be in
increasing order of film IDs. Note that you should not be printing
any IDs in your output.
[25 points]
What to turn in
- Prepare a PDF file with your answers to all the problems in Part 1.
One way to prepare the PDF file is to use LaTeX. If you are unfamiliar
with LaTeX, you may copy the file ~cs33/data/hw1-example.tex
and use it as a starting point: just edit it suitably. That file
contains instructions to compile the ".tex" input file to
produce a ".pdf" file: see the comments near the top.
- Write a single program that solves all the problems in Part 2.
The program should read a single command line argument: when this
argument is n, it should behave as specified in Problem
n of Part 2.
- Your program may be written in any general-purpose
programming or scripting language (e.g., C, C++, Java, Awk, Perl,
Python, Ruby, Haskell, ML, Lisp, Scheme, Shell Script), but
not in a special-purpose language for database handling,
such as SQL.
- The first few lines of your source code should contain comments
indicating how to compile and run your program on the machine
"sunapee". If your program is in a scripting language, compilation
may not be necessary, but you should still indicate how to run the
program, and place an appropriate shebang instruction (such as
#!/usr/bin/perl) on the first line. Please check to
ensure that your program does compile and run correctly on
sunapee! If you need help with this, send email to
"sysadmin@cs.dartmouth.edu" with a CC to the professor and the TA.
- Submit exactly two files — your PDF file for Part
1, and the source code of your prorgam for Part 2 — using the
homework
submission form on the course website.