Data Models
Admin
Lab 0 reminder
Data model basics
- simple representations of complex real-world data
- many facets to data - ask students:
- data itself - types, …
- data relationships
- data semantics
- data constraints
Terminology
-
entity
- a unique object that we want to model - it has data
-
attribute
- a characteristic of an entity - the data
- relationship
- describes an association among entities of specific kinds
- one to one (1:1)
- one to many (1:m)
- many to many (m:n)
- do we need any others?
- one to one (1:1)
- Hierarchies are relationships too
- describes an association among entities of specific kinds
-
constraints
- the set of rules defined to ensure data integrity and consistency, implement business rules, adhere to laws and regulations, etc.
-
schemas
- describe how data is structured
- examples on board:
- a set of student identity data:
name
,student_id
,address
, andbirthdate
- list of books with
title
,isbn
,author(s)
, andpublication date
- ask students for more examples
- a set of student identity data:
- schemas also describe (or imply) the relationships between sets of data
- a
student
enrolls in aclass
, aplayer
is on ateam
- db designers spend the majority of their time getting the schemas just right, using design methodologies, tools, and lots of reviews with their clients.
- a
- examples on board:
- describe how data is structured
-
data definition language (DDL)
- schemas are defined using a DDL
- they look sort of like data structures in programming languages
- example (in mysql) Draw on board
create table student ( id char(8), fname char(40), mi char(1), lname char(40), dept char (4) references dept(dname) );
- Why do we need the datatype?
- What if we want to change it later?
- This is one of the reasons that DB designers spend time up front to get it right.
- the DDL builds a table template (called a schema) stored in a data
dictionary which contains
- definition of the contents of the schema
- metadata (data about data)
- identification of a primary key for uniquely idenficying the instance
- constraints, such as references
-
instances
-
a specific set of data values mapping to the schema, representing an instance.
- a specific student
- a specific book
- sometimes a data values in an instance can be empty (NULL), if the schema allows it
- while schemas are analagous to structured types, instances are analagous to variables in programming language
-
- data manipulation language (DML)
- instances are created, manipulated, and accessed using a DML
- a.k.a., a query language
- e.g.,
insert
,delete
,modify
- most popular DDL/DML is SQL (structured query language) (pronounced “see-kwul”)
- instances are created, manipulated, and accessed using a DML
The relational data model
- used by most commercial Database Management Systems (DBMS)
- simple model, with simple declarative query language
We define a relational database as a collection of distinct relations (or tables) with zero or more instances of each. They are typically represented as a table:
SLIDE 1
- tuple == rows
- attribute == columns
Let’s look at this example more closely. It’s got some problems.
- Why are there blanks?
- How would we produce a list of employees sorted by last name?
- How can we count how many employees have a given certification?
- What about mispelled certifications?
- Can an employee have four certifications?
The designer of this table didn’t do a good job.
In a relational DB, each tuple should, ideally, have an entry for every attribute.
There is a NULL
value you can use, but it can really complicate some database
queries.
NULL
doesn’t mean “zero” or “false”, it means “there is no data here”. For
example: an employee
table entry with NULL
for the raise_amount
attribute
will make it tough to respond to a query like which employees raises of 1% or
less?
Attributes have a single type, and nearly always an atomic type (e.g., char
,
float
, integer
).
Naming conflicts
Each table
is independent from the others, and the tuples (or rows) may be
related (thus, relational) across tables via common values in specified
attributes.
You need to be careful about naming conflicts:
-- example from widom
student( id, name, gpa, age)
campus(name, enrollment) -- reuse of "name" ok
apply(id, campus) -- reuse of "id" ok
-- reuse of "campus" is not
Keys
A key for a relation is a set of one or more attributes such that no two tuples can have the same values for all of their key attributes. – Widom
- Key values are used to identify specific tuples of a table.
- Systems will often build indices using keys since they are most often used to look up tuples.
- Other tuples may use key values as logical “pointers”.
- In documents (like books and assignments), the standard practice is to identify keys by underlining.
Database models & design
-
Logical Design - Deciding on the database schema
- Database design requires that we find a “good” collection of relation schemas.
- Business rules
- What attributes should we record in the database?
- Which rules can be accomplished in the design, and which need application-layer assistance?
- Computer Science guidance
- What relation schemas should we have and how should the attributes be distributed among the various relation schemas?
- Which attribute(s) should be used as indicies?
-
Physical design - the physical attributes of the database
- distributed, sharded, etc. We will learn about these later in the term.
Business Rules
Businesses have rules, explicit and implicit, that they follow to run things.
From Coronel:
A business rule is a brief, precise, and unambigous description of a policy, procedure, or principle within a specific organization’s environment. In a sense, business rules are misnamed: they apply to any organization – a business, a government unit, a religious group, or a research laboratory; large or small – that stores and uses data to generate information.
Sometimes they are formally codified, as in Corporate Instruction Letter’s. Other times they are handed down through the generations like folklore:
SLIDE 3
” why do we do it that way? Well, uhm, we’ve always done it that way … I guess.”
The Database designer has to discover these rules, somehow. Where to look depends on the use(s) of the new database: Ask students where these might be found
The rules come from all over:
- C-Level execs (not always a reliable source, but they’re paying you)
- Department and front-line managers typically know the most
- Corporate Instructions
- Regulatory documents
- Policy and legal teams
- business plans
- strategies
- employee handbook
- managers and/or employee interviews
- …
Usually results in helping Company better understand itself.
This helps the designer to
- Understand the nature, role, scope of data, and business processes
- Develop appropriate relationship participation rules and constraints
- Create an accurate data model
Of course, the resulting data model is reviewed with the customer (at many of the same levels) to make sure what you heard is what they thought they said.
The designer has to select the business rules relevant to the desired database application, and formulate them as “brief, precise, and unambiguous descriptions of a policy, procedure, or principle”.
Ask students for example business rules Sample business rule scenarios
- Library
- Multi-screen cinema
- Vet clinic
- Restaurant (chef, waitstaff, maître d’hôtel, tables, customers, menus, etc.)
Two main types of business rules: database oriented and process oriented.
Database oriented business rules may be implemented via the design of the
database itself. Any constraints, high or low limits, NON NULL
, or other
data-oriented rules should bbe considered for incorporation into the database
design. For example, in a database for the registrar you might have a table for
COMMUNITY and the following business rule for that attribute:
SLIDE 4
All undergraduate students will be assigned to one of the four communities upon initial registration: NORTH, EAST, SOUTH, and WEST.
This business rule defines the acceptable values for the COMMUNITY attribute of any student’s row.
The other kind of business rule, the process oriented rule, cannot be defined solely by the database design. These rules must be enforced using code that references one or more attributes of an entity.
Application oriented business rules impose constraints that you cannot establish within the logical design of the database. You must instead establish them within the physical design of the database or within the design of a database application, where they will be more applicable and meaningful. (I use the term database application here to refer to a program written in some RDBMS software that allows people in the organization to use the database easily and to perform tasks related to their daily work activities.)
Here is an example of a typical application oriented business rule:
SLIDE 5
All students are required to maintain a valid email address with the registrar after graduation.
While the database could be designed to require a NON NULL entry for an emailAddress attribute, the database itself cannot determine whether or not the value stored there is indeed a valid, active email address for a given student.
Another example of an application oriented business rule is this:
SLIDE 6
Personal Health Information (PHI) of citizens of a member country of the EU may only reside on storage media that is physically within the boundaries of the EU as long as that citizen’s country is part of the EU.
While the database entry will likely include country of citizenship it’s design cannot determine whether that country is currently a member of the EU. Furthermore, since distributed databases are used to manage a large number of records that are to be accessed from a broad geographical area, it’s entirely possible to have some of the database systems (or shards) residing in the Cloud, which, by definition, may complicate the determination of the geographical position of the storage device holding a given EU citizen’s data.
Begin the translation
- Nouns translate into entities
- Verbs translate into relationships among entities
- Relationships are bidirectional
- Questions to identify the relationship type - How many instances of B are related to one instance of A? - How many instances of A are related to one instance of B?
[!TIP] Entity and relationship names should be singular, as in “a STUDENT enrolls in a CLASS”. Using plural forms is confusing.
As with most development efforts, good naming conventions must be agreed to.
- Entity (table) names - Be descriptive of the objects in the business environment - Use terminology that is familiar to the users
- Attribute name - Required to be descriptive of the data represented by the attribute
- Proper naming: - Facilitates communication between parties - Promotes self-documentation
E/R models
-
ERD or E/R Diagram
Entity Relationship Diagram - Not surprisingly, this technique models the app as a set of entities and relationships - graphic representation to model database - rows in the relational table represent the instances (ocurrences) of the modeled entity - connections represent the relationships between them
We will look at these in more detail later. Here’s a handy ERD Quick Reference (
.pdf
) to help you understand the example on the next slide.SLIDE 6
Here is a worked example:
SLIDE 7
Let’s do a quick design together
A database for a College Registrar.
SLIDE 8
Here are the business rules
- A Student must register for the courses they wish to take in a term. They can take up to six courses per term.
- A class has a professor, who might teach one or more courses each term. The same, or a different, professor will teach each class, but no professor will be required to teach more than three courses in a term.
- For courses with large enrollments, multiple classes may be created and registered students will be placed in one of them up to a maximum of 35 in each class.
- The classes are always indoors, in an assigned room. It is not required that all classes for a course be taught in the same room. Not all courses are taught each term.
- What are the entities (nouns)?
- What are the attributes of those entities?
- What are the relationships?
- Are there any cardinality constraints?
[^1]: Lecture notes based on texts by Coronel, Widom, Ullman, and Silberschatz.