CS 33 (Spring 2011): Homework 5: Normalization

Deadline: 10:00pm sharp on May 2, 2011

Please read through this page entirely and carefully before beginning your work.

Algorithms for BCNF testing and 3NF synthesis

For this homework, you are asked to implement two of the normalization algorithms outlined in the textbook.

  1. Implement the 3NF synthesis algorithm in Figure 8.12 (page 352) of the textbook, including the part where redundant relations are removed. Note that this requires an implementation of the canonical cover algorithm, which in turn requires an implementation of the attribute closure algorithm. Nowhere in your implementation should you be computing F+, the closure of a set F of functional dependencies.

    Your algorithm should read in a text file (the file name will be supplied on the Unix command line) describing a set of functional dependencies among attributes in a single table. It should compute a 3NF decomposition of this table and output a list of new tables, with their corresponding attribute sets, that make up the decomposed database.

  2. Implement the BCNF testing algorithm informally described at the bottom of page 349. As before, you should not be computing F+ for any set of functional dependencies.

    Combine your algorithm with the previous one, so that it tests the decomposition output by the 3NF algorithm and reports whether or not it is in BCNF. If it is not in BCNF, this algorithm should output the name of a table and a functional dependency that causes a BCNF violation in that table.

Input and output formats

Example runs

  1. Suppose the input file is called "class.txt" and its contents are as follows.
          // Functional dependencies for class scheduling database
          // Based on the example on page 351
    
          course_id         -> title dept_name credits
          building room_num -> capacity
    
          // Each term, course sections must be scheduled to avoid time/room conflicts
          course_id sec_id semester year -> building room_num time_slot 
    Suppose you write your code in Python and your script is called "hw5.py". We will probably run it as follows
          python hw5.py class.txt
    and here is one acceptable result for the run:
          Table 1: course_id title dept_name credits
          Table 2: building room_num capacity
          Table 3: course_id sec_id semester year building room_num time_slot
          The above decomposition is in BCNF.

  2. Take another input file "advising.txt" that has the following contents.
          ### This is our famous "BCNF-trouble" example
          prof_id -> dept_name
          student_id dept_name -> prof_id
    Suppose you write your code in Java and call the main file "Decomposition.java". We might run it like this
          javac Decomposition.java
          java Decomposition advising.txt
    and here is one valid output that we could expect:
          Table 1: prof_id student_id dept_name
          The above decomposition has a BCNF violation in Table 1:
          prof_id -> dept_name 

Grading rubric

Here is how we will grade the various parts of your implementation. The maximum score is 70 points.

  1. Correct implementation of attribute closure algorithm
    [10 points]

  2. Correct implementation of canonical cover algorithm
    [10 points]

  3. Correct implementation of basic 3NF synthesis
    [10 points]

  4. Correct removal of redundant relations before output
    [5 points]

  5. Proper handling of input and output formats, as described above
    [5 points]

  6. For BCNF checking, correct and complete checking of conditions
    [10 points]

  7. Correct identification of violating table and f.d., if one exists, and proper output formatting
    [5 points]

  8. Elegance of code, including meaningful and helpful comments
    [5 points]

  9. Quality of example runs, showing all facets of your implementation (see below)
    [10 points]

What to turn in

Using the homework submission form, turn in a single zip (or tar/gzip) archive consisting of the following files: