Data analytics/warehousing


Database schema before class: database_day16_end_of_class.sql
presentations.mwb
day17.xlsx

Slides from class
Code from class

Database notes

Business intelligence
A comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information to support business decision making.
Data warehouse
An integrated, subject-oriented, time-variant, nonvolatile collection of data that provides support for decision making. Data is normally drawn from many different sources.
Data mart
A small, single-subject data warehouse subset that provides decision support to a small group of people.
Extract, transform, load (ETL)
Getting data from original sources into a data warehouse. ETL involves retrieving the data from the original source (extract), manipulating the data into an appropriate form (transform), and storing the data in the data warehouse (load).
Star schema
A data modeling technique used to map multidimensional decision support data into a relational database. The star schema represents data using a central table known as the fact table in a 1:M relationship with one or more dimension tables.
Facts
In a data warehouse, the measurements (values) that measure a specific business aspect or activity. For example, sales figure are numeric measurements that represent product or services sales. Facts commonly used in business data analysis commonly include: units sold, costs, prices, and revenues.
Fact table
In a star schema, the table that contains facts linked and classified through their common dimensions. A fact table is in a 1:M relationship with each associated dimension table.
Dimensions
Characteristics that provide additional perspectives to a given fact.
Dimension table
A table used to search, filter, or classify facts within a star schema.
Snowflake schema
A type of star schema in which dimension table can have their own dimension tables. The snowflake schema is usually the result of normalizing dimension tables.
Partitioning
The process of splitting a table into subsets of rows or columns.
Replication
The process of creating and managing duplicate versions of a database. REplication is used to place copies in different locations to improve access time and fault tolerance.
Periodicity
The timespan of data stored in a table. Usually expressed as current year only, previous years, or all years.
Online Analytical Processing (OLAP)
Creates a data analysis environment that supports decision making (as opposed to transactions), business modeling, and operations.
Note: many definitions from Coronel and Morris.