CS 33 (Spring 2009): Homework 1: Working with relational algebra

Deadline: 10:00pm sharp on April 13, 2009

Please read through this page entirely and carefully, before beginning your work.

Part 1: Purely mathematical problems

  1. Solve problems 2.5 (a) and (b) from the textbook, using only the fundamental relational algebra operations.
    [10 points]

  2. Solve problems 2.5 (c), (d) and (e) from the textbook. In addition to the fundamental relational algebra operations, you may use the natural join and division operations if you like.
    [15 points]

  3. Solve problem 2.6 (a), (b) and (c) from the textbook. You may use any of the relational algebra operations described in Chapter 2.
    [15 points]

Part 2: Programming problems

  1. The file ~cs33/data/baseball_salaries_2003.txt contains salary information for certain professional baseball players from the year 2003. It should be self-explanatory. Write code to read this file, process it appropriately and determine, for each position, the average salary of the players in that position. Your code should write seven lines of text to standard output, with each line formatted (in printf notation) as "%8d: %s", where the first parameter (the %8d) is the average salary rounded to the nearest integer and the second parameter (the %s) is the name of the position, exactly as in the input file. The lines should be sorted in descending order of average salary. Note that the seven positions represented in the input file are "Catcher", "First Baseman", "Outfielder", "Pitcher", "Second Baseman", "Shortstop" and "Third Baseman".
    [15 points]

  2. The file ~cs33/data/baseball_salaries_2005.txt is very similar to the above and contains data for 2005, but for a different set of teams. However, there are a number of players who appear in both data sets. Write code to read the two input files and determine the set of players that appear in both files, along with their 2003 and 2005 salaries. Your code should write one line of text to standard output for each such player, formatted as "%-21s: %8d: %8d", where the three parameters (in order) are the player's name (exactly as in the input files, in quotes), the 2003 salary and the 2005 salary.
    [20 points]

  3. Examine the following three files, which contain data about a set of imaginary actors and a set of imaginary films (movies) along with information about which actors appear in which films:
         ~cs33/data/actor.txt~cs33/data/film.txt~cs33/data/actor_film.txt
    The comment lines on top of each file explain the contents adequately. Write code to process these three files and determine the actor(s) that appear in the maximum number of films. There may be more than one actor tied for the maximum. Your code should write several lines to standard output as follows. For each actor tied for the maximum, write the actor's first name and last name on one line, separated by a single space. In the next few lines, write the name of each film in which this actor appears, one film per line: the film names should be exactly as in the input file, complete with single quotes. Then go on to the next actor, write their name, followed by the names of the films they appear in, and so on. The actors should appear sorted in increasing order of actor IDs and the list of films appearing beneath each actor's name should be in increasing order of film IDs. Note that you should not be printing any IDs in your output.
    [25 points]

What to turn in