Why databases?
Slides from class
Day 1 spreadsheet
What's this course about?
It's hard to imagine modern life without databases. You use them every day, even if you don't realize a database is involved. For example, every time you use a credit card, a database stores your available credit, records your purchases, and a billing application accesses the database to generate a bill once a month. When you signed up for this class, Dartmouth used a database to record which courses are offered each term, who will teach them, and which room they are assigned. Dartmouth also used the database check that you've taken all the prerequisites for the course.
Almost all non-trivial systems use a database in one form or another. In this course we will see how databases are structured, how to query them for information, and what makes them fast.
Administrative stuff
We'll be using Canvas as the main organizational point for the course. Because it's a bit easier for me, and more importantly because they will then persist publicly beyond the term, the actual course notes will be maintained on my own personal web site (as you can see from this page). There will be some cross-referencing within my web site and between Canvas and my web site; hope that goes smoothly enough.
Stuff to look at now:
- Who, when, where
- Assessment: labs and exams
- Collaboration policies
- Honor code
- Schedule
- Software
Note that Lab-0 is out, just to take care of the preliminaries. In particular, we need to get your databases created
on the computer science servers.
Lecture notes are here to help you, to keep you from having to write down and type in a lot of stuff that we go over in class. However, they aren't necessarily complete, and they're also simple, sometimes terse, not necessarily grammatical, etc. They're working notes, and are not intended to take the place of the lectures or the readings. That being said, additions, corrections, and suggestions are welcome.
I will provide source code for the programs we go through in class, but in many cases much of the class will be spent interactively building them up (or tearing them apart). The best way to learn the material is by doing it (that's why we have homeworks!), and I will try to simulate that process as best I can within the time constraints and so forth.
Database notes
data
Raw facts from which information is derived. Data have little meaning unless they are grouped in a logical manner.
information
Information is produced by processing data to reveal the data's meaning.
knowledge
The body of information and facts about a specific subject. Knowledge implies a familarity, awareness, and understanding of information as it applies to an environment. Includes judgment and experience.
database
A computer structure for storing data in a shared, integrated fashion so that the data can be transformed into information as needed. Often used interchangeably with DBMS.
centralized database
A database stored in a single location.
distributed database
A logically related database in one or more locations.
database instance
A snapshot of the database at a specific point in time.
DBMS
Database Management System. A collection of programs that manage the database structure and that control shared access to the data in the database.
RDBMS
Relational Database Management System. A DBMS that uses the relational model.
database administrator (DBA)
A person responsible for planning, organizing, controlling, and monitoring a database.
data model
A representation, usually graphic, of complex "real world" data. Used in the design and maintenance of a database.
relation (or table)
A logical construct perceived to be a two-dimensional structure composed of intersecting rows and columns that represents an entity set in the relational model.
relational model
Developed by E.F. Codd at IBM in 1970, the relational model is based on mathematical set theory and represents data as independent tables.
schema (logical schema)
A logical grouping of database objects. Like a blueprint of how data is arranged. Logical schema is independent of how the data is actually stored on disk; abstracts details of how data is stored from user.
physical schema
Description of ow data is physically arranged on disk.
entity
A person, place, thing, concept, or event for which data can be stored.
entity set
A collection of entities.
attribute (or field)
A characteristic of an entity. An attribute has a name and a data type.
attribute instance (or row or tuple or record)
A logically connected set of one or more attributes that describes an entity. For example, a Customer instance may be composed of the fields ID, LastName, FirstName, Address, City, State, Zip, AreaCode, and PhoneNumber about a particular customer (e.g., Sally Jones).
data dictionary
A DBMS component that store metadata — data about data. Contains the data definition (e.g., name and data type) for attributes in a relation, as well as their relationship with other relations.
view
A virtual table based on one or more relations that may not contain all attributes of the underlying relations. Sometimes used to restrict attributes visible to a user (e.g., some users do not need to know employee salary information) or to simplify queries.
data definition language (DDL)
Language that allows a DBA to define database structure and schema. SQL in relational databases.
data manipulation language (DML)
Set of commands that allow a user to manipulate the data in a database. SQL in relational databases.
Structured Query Language (SQL or 'sequel')
Powerful database language that allows users to create databases and manipulate data.
CRUD
Operations done on a database, short for Create, Read, Update and Delete.
Application Program Interface (API)
Software through which programmers can interact with database. Commonly implemented as a set of web services using HyperText Transfer Protocol (HTTP) and Representational State Transfer (REST).
Online Transaction Processing (OLTP)
A database designed for production use. Keeps the most current data and allows CRUD manipulation of data via queries.
Online Analytical Processing (OLAP)
Tools for turning data in the database into information. Often creates a second database (data warehouse) specialized for running analytical queries on the data so the transaction (production) database is not slowed down by analysis (and vice versa).
Analytical database/Data warehouse
Specialized database optimized for decision support. Often holds a historical copy of the data in the OLTP database and is refreshed with the most current data on a regular basis.
structured data
Data that conforms to a predefined data model.
unstructured data
Data that exists in its original, raw, state; that is the format in which it was collected and does not conform to a predefined data model.
semi-structured data
Data that has been processed to some extent. Example: email.
transaction
A sequence of one or more database requests that form a logical unit of work. A transaction must either complete successfully or abort (make no changes to the database).
Note: many definitions from Coronel and Morris.