CS 33 (Spring 2009): Course Project

Deadline for Stage 1 (High-Level Design): Before class on May 4, 2009

Deadline for Stage 2 (High-Level Design): 10:00pm sharp on May 11, 2009

Deadline for Stage 3 (SQL-Mania): 10:00pm sharp on May 18, 2009

[This page will be updated from time to time]

Your course project will consist of conceiving, designing and implementing a moderate-sized database application for an enterprise of your choice (except for a bank, since a bank is used heavily in the book as a running example). You will not need to implement an actual DBMS: you should use MySQL instead. You will need to implement an elegant and intuitive user-interface, which will almost surely be Web-based. At this point, you have already used a connector to work with a MySQL database while coding in a general-purpose language. The textbook has some basic coverage of the other relevant technologies that you might need: CGI, PHP and XML. We shall also discuss these briefly in class, at the appropriate time.

The stages of this project will be as follows:

High-Level Design: Identify the entities, attributes and relationships that your database will represent. Prepare a detailed E-R diagram showing this information. Also, at this stage, identify the potential users of the system, and what kinds of authorizations they should have to access or modify the data in your database.

Schema Design: Translate your E-R diagram into a schema for your relational database. Pay heed to normalization and be prepared to reason cogently about (1) the level of normalization of your database, and (2) any purposeful deviations from normal forms and your justification for such deviations. Also, at this stage, make a list of views that you might wish to create, based on the varying needs or different types of users.

SQL-Mania: Prepare a large collection of SQL statements for a number of natural queries and database updates that each of your user types might want to perform. Populate your database with a large amount of data and test your SQL statements. At this stage, you should spend some time designing intelligent ways to generate test data. Random is usually good, but purely random data may hide problematic performance-busting corner cases, so you will have to be cleverer than that.

The Application Layer: Provide a suitable intuitive interface (ideally, web-based) for each category of user that will be using your database. When appropriate, you may combine interfaces for multiple user types (as in, e.g., Dartmouth's Banner system). You should also provide an interface for appropriately authorized users to modify data (depending on your application, a web interface might not be the best for this). At this stage, you need not bother about users lying to your system about their identity.

Authentication: Finally, implement a web-based authentication system in preparation for a mock deployment of your entire database application. In other words, ensure that your users are not lying about their identities.

The choice of what enterprise to build an application for is up to you. Here are some suggestions to get you started. There is no obligation to pick from one of these.

Academics at a university. Your system will manage information about students, professors and staff at the university, plus courses, grades, prerequisites, majors, etc. This is only a starter list. Flesh this out as appropriate.
An airline. Manage information about a fleet of aircraft, flights, fuel purchases, customers, passengers, frequent flier miles, airports, etc. Again, this is only a starter list and should be fleshed out.
A hospital. You might want to interface with a toy health-insurance system to make this interesting and complex. There are interesting authorization issues that arise here, and if you decide to take up this topic, and you want to provide some realism to your project, you ought to talk to people at DHMC to understand these issues.

What to turn in (Stage 1)

By 5:00pm on Thu Apr 30, notify the professor and the TA about your chosen project topic. A simple email will suffice. No two teams can pick the exact same topic, so if you know early what you want to do, then notify us early and "reserve" the topic!

Before class on Mon May 4, prepare (as a team) your E-R diagram and information about potential users of your system. During class that day, you will present this information to the class, and we will critique everyone's high-level design as a class. Make sure everyone on the team gets a chance to speak about some part of the design. Plan a 10-12 minute presentation.

You need a way to show your E-R diagram to the entire class. Good presentation software, such as Microsoft's PowerPoint, or Apple's Keynote, is recommended. But feel free to use whatever software you find useful.

What to turn in (Stage 2)

Describe the schema for your database in a suitably formatted document. You can use LaTeX to create a PDF (as in some of the homeworks), or you can use MS Word or a similar program to produce your document.

Write an SQL script to create the tables in your database, declaring appropriate types for each attribute, and including the appropriate basic integrity constraints, such as primary key, foreign key, not null, etc. Your script should also create views that particular users will have access to. (It's likely that you will have to add some more views at a later stage of the project, but you should think through the design deeply so you already have close to a complete set of views.)

Submit the two documents (the schema description, and the SQL script) as "Homework 4", using the homework submission form. Only one member of each team needs to submit.

What to turn in (Stage 3)

Please put your queries in a ".sql" file annotated with plenty of comments explaining what each query is doing is what type of user would be executing these queries.

Populate the database (on sunapee) assigned to your team. You don't have to submit the data in any other form; we can look at your tables directly. But if you made changes to your schema from Stage 2, please submit an updated schema. Also, you may want to add comments after your schema explaining how you generated/obtained your data.

Note that at least a couple of your tables should have something like 10,000 rows each. The data in your tables could be artificial, randomly generated, or scraped off the web.

Submit two documents (the SQL file with the queries, and an updated annotated schema) as "Homework 5", using the homework submission form. As before, only one member of each team needs to submit.