Lab 2 - MySQL Database, Parts a, b, & c

The purpose of this three-part assignment is for you to get hands-on experience with designing amd constructing a database for a real-world domain. You will see all aspects of database creation, including: understanding the domain and interpreting the business rules, ensuring your ER diagram and design accurately represents the domain, normalizing the design, implementing it using SQL, and testing the resulting database.

This assignment is based on the framework of the superb lab assignment crafted by Steven Van Acker and several of his colleagues at Chalmers University of Technology and University of Gothenburg. This derivative work was created with his kind permission.

Assignment submission

You must submit your solution to each part via Canvas as directed in each Part.

To earn full points for the second and third parts of the assignment, the graders must be able to run your submitted .sql files to setup the database on the mysql server, populate it with test data, and verify its proper operation.

It is essential that you name your submitted .sql files as directed. Otherwise, the autograder will be unable to grade your work, resulting in a reduced score.

Introduction

In this assignment you will design and construct a relational database to support an academic journal’s manuscript management system. The database will be driven by a front-end application that we provide.

You will do this assignment in three distinct Parts:

Part Description
Lab 2a Designing the database schema (ER diagram, FD’s, & normalization)
Lab 2b Constructing the database (SQL, data, & testing
Lab 2c Automating the database (Triggers, Views, and Procedures/Functions)

Be sure to read through the full description of the assignment before you begin.

Domain description1

The domain that you will model in this assignment is that of an academic journal’s manuscript management system. You are to create a system to support the business needs described below.

You need to to carefully read this section as the business rules are embedded within it.

The Journal Of Competitive Kinesiology is a prestigious research journal for collegiate sports team physicians. It uses a peer-review process to select manuscripts for publication. About 10 percent of the manuscripts submitted to the journal are accepted for publication. A new issue of the journal is published each quarter.

Unsolicited manuscripts are submitted by authors. When a manuscript is received, it is assigned a unique manuscript number, and basic information about it is recorded in the system database. The title of the manuscript, the date it was received, and an initial manuscript status of “received” are entered. Information about the author(s) is recorded. For each author, the author’s name and current affiliation (school or company for which the author works) is recorded. When a manuscript has multiple authors, it is important to record the order in which the authors are listed in the manuscript credits. The first author is designated as the primary author and their e-mail address is recorded.

Every manuscript must have at least one author and one InterestCode (ICode, which are defined here). The information about Authors that have submitted manuscripts is kept in the system. It is typical for a manuscript to have several authors. A single author may have co-authored many different manuscripts submitted to the journal.

Once the manuscript is received and entered into the database, the editor will briefly review the topic of the manuscript to ensure that the manuscript’s contents fall within the scope of the journal. This scope is defined by a set of ICodes defined for the Journal. If the content is not within the scope of the journal, the manuscript’s status is changed to “rejected” and the primary author is notified via e-mail. If the manuscript is among those appropriate for the Journal, the editor begins the selection of three or more reviewers to review the manuscript. The manuscript status will remain “submitted” until the three reviewers are assigned, whereupon the status will change to “review”.

Reviewers work for other companies or universities and read manuscripts to ensure the scientific validity of the manuscripts. For each reviewer, the system records a reviewer number, reviewer name, reviewer e-mail address, affiliation, and one to three ICodes which represent the reviewer’s areas of expertise. A reviewer may have up to three areas of expertise, and an area of expertise may be associated with many reviewers. All reviewers must specify at least one area of expertise.

If at least three reviewers have identified the manuscript’s ICode as one in which they have expertise, the editor will change the status of the manuscript to “under review”, email the manuscript to the reviewer with a cover letter, and record which reviewers the manuscript was sent to and the date on which it was sent to each reviewer. A reviewer may receive several manuscripts to review each year, while others may not have reviewed any manuscripts yet.

If there are an insufficient number (i.e., less than 3) of reviewers in the system with that ICode, the editor has to reject the manuscript. The manuscript’s status is changed to “rejected” and the primary author is notified via e-mail

The reviewers will read the manuscript at their earliest convenience and provide feedback to the editor. The feedback from each reviewer includes rating the manuscript on a 10-point ACME scale (10 is best) for Appropriateness, Clarity, Methodology, and Experimental results, for a total score of up to 40 points, a recommendation for publication (ACCEPT (10 points) or REJECT (0 points)). The editor will add all this information for each review received and the date that the feedback was received. Once all the reviewers have provided their evaluation of the manuscript, the editor will decide whether to publish the manuscript based on a minimum average of all the reviewer’s final scores. This minimum average is defined by the system at startup (e.g., 40 or the possible 50 points might be a good minimum for a selective journal). If the editor decides to publish the manuscript, the manuscript’s status is changed to “accepted” and the date of acceptance for the manuscript is recorded. If the manuscript is not acceptable, the status is changed to “rejected” and the primary author is notified by email.

✅ Tip

We do not expect you to send any email. Just print a message to the screen using the following SQL code snippet:

SELECT 'message2' as 'message1';

This will appear on the screen as

+----------+
| message1 |
+----------+
| message2 |
+----------+

The message1 text may be omitted by replacing message1 with '' .


System Specification

Design, implement, and test a MySQL database for the above domain.


Lab2a: Designing the database

First we design esign the database. The goal of this part and the next is to produce a complete database ER that could later be implemented in MySQL.

  1. Create a draft ER diagram that models the domain described above. You will likely need to use many of the ER constructs we discussed in class. Be sure to include all cardinalities (high and low) in your diagram where needed.

  2. Identify all the functional dependencies that you expect should hold for the domain. Do this using both the domain description and the set of attributes in your draft ER diagram. You should always look for the functional dependencies in the original domain description (in case you missed one). Remember to check for functional dependencies with multiple attributes on the left-hand side of the arrow.

  3. When you have found all functional dependencies, use them to correct or enhance your draft ER diagram. For any constraints not captured by your relation schema, either :

    • add the extra constraint to your schema, or

    • argue why your schema should not capture the constraint (e.g. the usage of the DB would be less efficient, storing a value would get out of date ).

  4. Ensure that your design identifies PRIMARY KEY and FOREIGN KEY attributes.
  5. Perform the same analysis for any constraints needed to handle any cyclic relationships in your diagram.

  6. Finally, you should verify that all your relations are in third normal form. This should be almost trivial because the relations are small and you identified all the functional dependencies.

There are tools for creating ER diagrams. You may use any tool you wish as long as it produces a .pdf (i.e., an app, not a pencil, pen, or crayon). I recommend MySQLWorkbench (free). It provides a complete MySQL environment that includes server access, running MySQL scripts, and ER modeling. It is freely downloadable here. Other tools are listed on our Resources page.

For Lab2a, you should submit via Canvas

Create a compressed .tar file named lab2a.tgz file that contains:

  • diagram.pdf
    your final ER diagram.
  • fds.pdf
    the full list of functional dependencies that you have identified.

Submit that .tgz file to Canvas.

Submissions of .doc, .pages, .txt, .odt, .tex files, or scanned handwritten solutions in the lab2a.tgz file will not be graded.


Lab2b: Constructing the database

Create the .sql files (named as specified) to implement the database in MySQL.

Include comments in SQL code where necessary. Use the C style opening /* and closing */ where possible otherwise precede comments with -- and finish them with a new line. Here are examples from Holywell’s SQL style guide:

/* Updating the file record after writing to the file */
UPDATE file_system
   SET file_modified_date = '1980-02-22 13:19:01.00000',
       file_size = 209732
 WHERE file_name = '.vimrc';
SELECT file_hash  -- stored jenkins hash
  FROM file_system
 WHERE file_name = '.vimrc';

⚠️ Warning

It is essential that your submitted .sql files are named exactly as specified. Otherwise, the autograder will be unable to grade your work, resulting in a reduced score.

Using your ERD from Part a , build the following .sql scripts:

  1. tables.sql to create all tables. Include DROP ... IF EXIST ... before CREATE statements, key and foreign key constraints, and you should include checks where possible to ensure that valid data can be inserted or updated in the database.

    📝 Note
    If you did not complete Part a , an ER diagram can be provided to you by the teaching team.

  2. inserts.sql to fill the tables with example data. This can be time-consuming, but it is an important part of the development of a database. Having a variety of data in the database that covers all cases, both common occurrences and rare or edge cases, is essential to properly verify that the database behaves the way that you expect it to. You should fill the tables with enough data so that it is possible to test that the database can handle the various operations specified above. Inserting lots of data is of no use if the data doesn’t test all parts of the database. Here is a sample of the kinds of data and tests you should include:

    • Several authors (>4)
    • Several (>4) reviewers with a variety of ICodes.
    • A set of manuscripts, with at least two in each possible status

    ✅ Tip

    Don’t forget that you can generate data at GenerateData.com rather than manually. It may take a little effort, but you can generate lots of data in the form you want it.

    ✅ Tip

    It can be tricky to insert data into tables with FOREIGN KEYs. You may have to create some entries with NULL values for the FOREIGN KEY and then UPDATE them later once the target key exists.

  3. test2b.sql contains the SQL commands to test the database

For Lab2b, you should submit via Canvas

Create a compressed .tar file named lab2b.tgz that contains:

  • tables.sql
  • inserts.sql
  • test2b.sql

Submit that .tgz file to Canvas.

Submissions of any other files in the lab2b.tgz file will not be graded.


Lab2c: Automating the database

Create the .sql files (named as specified) to add automation to the MySQL database.

⚠️ Warning

It is essential that your submitted .sql files are named exactly as directed. Otherwise, the autograder will be unable to grade your work, resulting in a reduced score.

Views

Since you know what information users of your application will need from the database, and in what forms, it is a good idea to include SQL VIEWS that provide that information in a simple form. In a live deployment, we would use DB privileges to ensure that the users can work only with views when querying the DB.

Create these views (with the names given) in a SQL file named views.sql:

  • LeadAuthorManuscripts For all authors, return tuples with their last name, author id, and the manuscript id for which they are the primary author, along with the current status of the manuscript, and the timestamp of the most recent status change. If an author currently has two manuscripts in the system, the view would return two tuples (rows). Results ordered by author last name, author id number, and then by increasing timestamp of the most recent status change.

  • AnyAuthorManuscripts For all authors, their name, id, and the manuscript(s) for which they are among the authors (if any), along with the status of the manuscript(s). Results ordered by author last name and then by increasing timestamp of the most recent status change.

  • PublishedIssues For all completed (published) issues, the issue year, issue number (1, 2, 3, or 4), the title of each manuscript included in that issue, with page numbers, ordered by issue year, issue number, and page numbers.

  • ReviewQueue For all manuscripts in UnderReview state. For each such manuscript the view should include a row containing:

    • primary author,

    • author id,

    • manuscript id, and

    • assigned reviewer id(s)

    with the rows ordered by increasing submission timestamp. This View is also used by ReviewStatus.

  • ReviewStatus For all manuscripts assigned to a specified Reviewer. For each such manuscript, the view should include a row containing:

    • the timestamp when it was assigned to this review

    • the manuscript id
    • the manuscript title

    • the review results (integer values 1-10)

      • appropriatenes score
      • clarity score
      • methodology score
      • experimental resuts score

      • recommendation (either “submitted”, “review”, “accepted” or “rejected”)

      • ordered by increasing submission timestamp.

✅ Tip

Since you cannot modify a view “on the fly”, you will need to follow a process like the following to get the desired Reviewer_Id into the view:

  1. create a SQL function ViewRevId that returns the value of a SQL variable (e.g., @rev_id)
  2. use that SQL function in the view to identify the desired reviewer id
  3. set the variable (e.g., SET @rev_id = reviewer_id_you_want)
  4. SELECT whatever FROM ViewRevId @rev_id

Here’s an example of a MySQL function returning a SQL variable.

Triggers

When your tables and views are implemented correctly, the next step is to create triggers to handle some key issues in the day-to-day use of the database.

You can find detailed information on triggers in our class notes and in the MySQL Triggers Documentation.

You may find that creating one or more additional views might simplify your triggers.

Create these views (with the names given) in a SQL file named triggers.sql:

  • triggerNoReviewer

    When a new manuscript is submitted the submitting author must specify an ICode. It is possible that there may no reviewer in the system that indicated that ICode as one he/she could handle. Detecting this situation is a perfect job for a trigger.

    When a new manuscript is added to the system with an ICode for which there are not at least 3 reviewers who handle that ICode this trigger should raise an exception that informs user the paper can not be considered at this time.

  • triggerReviewerResign

    Reviewers sometimes unexpectedly withdraw their services. If there are no reviewers with ICodes matching the manuscript’s that are not already assigned to the manuscript, the manuscript must be rejected.

    When a reviewer resigns, any manuscript in “review” state for which that reviewer was a reviewer AND there is another reviewer in the system with the matching ICode that is not already assigned to review it, that manuscript must be reset to “submitted” and an appropriate exception message displayed. Otherwise, the manuscript must be rejected by setting its state to “rejected” and an appropriate exception message displayed.

  • triggerNoReviewer

    If no reviewer with the required ICode is available, the manuscript should be rejected. Note that “available” means a reviewer exists with the right iCode and who is not already assigned to review that manuscript.

  • triggerPublish When a manuscript’s status changes to “accepted”, that update triggers an immediate status change to “publish”.

✅ Tip

Triggers can raise exceptions (which issue messages) like this:

  ...
  set msg = concat('LAB2: No reviewers for manuscript id ', cast(manuscript_id as char));
  signal sqlstate '45000' set message_text = msg;
  ...

Here’s an example of defining your own exception conditions and messages: triggerException.sql

Stored procedure

procedures.sql Provide a stored procedure or function that can be invoked to make the accept/reject decision based upon the average score from the reviewers for a document. All the logic for the decision is localized in this stored procedure/function for ease of update. It should return an indication of its result.

The minimum score for a manuscript to be accepted should be stored in a global SQL variable and set in a .sql that is run before this one.

Testing

✅ Tip

If you run the MySQL Command line interface (CLI) with the --force option it will continue to process SQL commands even when they encounter errors (which testing scripts usually do!).

Provide the following test files to demonstrate that everything is working:

  • trigger_tests.sql Test cases for testing your triggers.
  • procedure_tests.sql Test cases for your stored procedure.

Feel free to create additional procedures/functions as you need them.

For Lab2c, you should submit via Canvas

Create a compressed .tar file named lab2c.tgz file that contains:

  1. tables.sql
    your SQL code for creating the tables in the MySQL database .
  2. inserts.sql
    your SQL code containing the INSERT statements for the data.
  3. views.sql
    your SQL code for creating the views defined above (and your own, if any).
  4. triggers.sql your SQL code for creating the triggers.
  5. trigger_tests.sql your SQL code set up for and test the triggers
  6. procedures.sql your SQL code for setting up the stored procedure
  7. procedure_tests.sql your SQL code to setup for and test the stored procedure
  8. invalid_data_tests.sql your SQL code which includes attempts to add invalid or inconsistent data to the database. All these attempts should fail.

The grading team will run each of the SQL scripts in the above order to verify the database is properly setup and tested.

Make sure that MySQL can execute your files before you submit them. Test this by clearing out your database and then running your SQL files and tests in the above order.

Submissions of any other files in the lab2c.tgz file will not be graded.


Provided code

ICodes.sql

Since ICodes should be updateable, they should be stored in a database table. Use this ICodes.sql file to create that table.


  1. Based on a question from Coronel 12th ed., Chapter 5.