admin
Graduate students, I suggest you get started on your papers.
SQL Code: demo.sql soccer.sql soccer2.sql
SQL History
- The Structured English QUEry Language, or SEQUEL, was developed by Donald Chamberlain and Raymond Boyce as part of the System R project at the IBM San Jose Research Laboratory, first appearing in 1971.
- Due to prior trademarks for that name, it was later renamed Structured Query Language (SQL), but still pronounced “SEE-kwul”
-
ANSI and ISO standards emerged and evolved:
- SQL-86, SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2008, SQL:2011, SQL:2016, SQL:2019
- Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later standards and special proprietary features.
-
Note: Some examples we discuss or in the text may work on your particular system.
-
SQL is used for
- Creating databases
- Adding, changing, or deleting database structures
- Adding, modifying, or deleting database records (tuples)
- Querying databases (information retrieval)
- Four parts to the SQL definition
- Data Definition language (DDL) -
CREATE
,ALTER
,DROP
, … - Data Manipulation language (DML) -
SELECT
,INSERT
,UPDATE
,DELETE
, … - Data Control language (DCL) -
GRANT
,REVOKE
, … - Transaction Control Language (TCL) -
COMMIT
,ROLLBACK
, …
- Data Definition language (DDL) -
SQL Overview
-
We will follow the SQL Style Guide by Simon Holywell (13 pages) - Keywords are ALWAYS in CAPS
-
Table and attribute names are not capitalized and avoid oddities such as camelCase. - Tables should have a collective name, like
staff
instead ofemployees
. - Attribute names should always be singular. - Use underscores (‘_’) where you would normally have a space, such as inlast_name
. - Use letters, numbers and underscores in names. - One line per attribute definition -
Every statement ends in a ‘;’
SELECT ... FROM ... WHERE ... ;
-
-
We will use MySQL Version 8.0 .
SQL DDL
- SQL Data types
- See MySQL Datatypes
Instructors setup your database
CREATE DATABASE someones_db;
GRANT ALL ON someones_db.* TO someones IDENTIFIED BY 'qwerty123';
GRANT ALL ON someones_db.* TO ccpalmer;
GRANT ALL ON someones_db.* TO kizito;
That way only you and the instructors can USE
your database. For example:
SHOW GRANTS FOR Peter;
Once you have a database, you connect to it.
- via ssh and terminal mode - typical use
$ ssh babylon7.thayer.dartmouth.edu
~ $ mysql --user=d29265d --password -h cosc061-ad12-db.c.dartmouth.edu cosc_61
mysql> SHOW TABLES;
+-----------------+
| Tables_in class |
+-----------------+
| college |
| player |
| tryout |
| members |
| . . . |
+-----------------+
4 rows in set (0.00 sec)
mysql> ^D
$
-
via MySQLWorkbench
-
setup the server, userid, password, and default schema (database) …
-
SHOW TABLES;
-
Open SQL Script file
soccer.sql
CREATE TABLE
NOT NULL
UNIQUE
REFERENCES
INSERT INTO ... VALUES ...
-
-
Gotta be careful though - suppose we had this:
CREATE TABLE college ( cName varchar(20) UNIQUE, state varchar(2), enr numeric(5,0) );
That wouldn’t work for two OSU’s “Oklahoma State Univ” and “Ohio State Univ”.
-
SELECT
SELECT A1, A2, ..., An FROM R1, R2, ..., Rm WHERE condition ;
-
back to the example … do several
SELECT
s, includingSELECT *
SELECT * FROM tryout; /* */ SELECT * FROM player WHERE pName="David";
-
DISTINCT
eliminates duplicates. By default,SELECT
assumes theALL
modified which will return all of the tuples that match. TheSELECT DISTINCT
version eliminates duplicates. Here, the query returns a subset of the matching tuples and some of them are duplicates.INSERT INTO player (pID, pName, yCard, HS) VALUES (70007, 'Rosario', 'yes', 400); INSERT INTO player (pID, pName, yCard, HS) VALUES (80008, 'Rosario', 'yes', 800); SELECT * FROM player; /* defaults to ALL and returns duplicates since Rosario's tuple matches both conditions */ SELECT * FROM player WHERE pName = "Rosario" OR yCard = "yes"; /* DISTINCT says no duplicates! */ SELECT DISTINCT * FROM player WHERE pName = "Rosario" OR yCard = "yes";
-
sort the resulrs with
ORDER BY
.- Note that
NULL
values are handled together, sometimes rising to top or sinking to bottom of list.
SELECT * FROM tryout WHERE decision = "yes" ORDER BY pPos; SELECT * FROM player ORDER BY pName ASC, HS DESC;
- Note that
-
What is the schema for that table?
SHOW CREATE TABLE college;
-
Can I clone a table? Yes!
CREATE TABLE college2 LIKE college;
and check that it worked with
SHOW CREATE TABLE college2;
Special Operators
LIKE
does simplistic string matching:%
matches zero or more characters, and_
matches exactly one character.-
REGEX
does simple regular expression matching:^
matches the beginning of the value being tested$
matches the end of the value being tested.
matches a single character[listofcharacters]
matches any single character in the bracketed list[char1-charN]
matches any single character in the range specified-
|
provides an “OR” condition between two patterns, matching either oneSELECT * FROM college; /* */ SELECT * FROM college WHERE cName LIKE "%SU"; -- gives LSU ASU /* */ SELECT * FROM college WHERE cName REGEXP 'U$'; -- gives LSU ASU OU /* */ SELECT * FROM college WHERE cName REGEXP '^L'; -- gives LSU /* */ SELECT * FROM college WHERE cName REGEXP '^[LO]'; -- gives LSU OU
- more keywords to narrow selections are
BETWEEN
,IS NULL
,IS NOT NULL
,IN
:
SELECT pName, HS
WHERE HS
BETWEEN 299 AND 999;
/* */
SELECT DISTINCT pName
FROM player
WHERE pName BETWEEN 'AAA' AND 'MMM';
/* */
SELECT DISTINCT pName
FROM player
WHERE pName BETWEEN 'MMN' AND 'ZZZ';
/* */
SELECT cName
FROM college
WHERE state in ('TX','LA','OK','AR');
AND
,OR
,NOT
SELECT *
FROM player
WHERE HS < 1000;
/* */
SELECT *
FROM player
WHERE HS < 1000
AND yCard = "yes";
/* */
SELECT *
FROM player
WHERE HS < 1000
AND pName NOT LIKE ('G%');
/* */
SELECT DISTINCT pName
FROM player
WHERE pName NOT BETWEEN 'AAA' and 'MMM';
- Renaming with
AS
- Just make sure it’s a unique new name!
SELECT DISTINCT pName AS city_player
FROM player
WHERE HS > 1000;
- Arithmetic and Aggregate functions
- The usual arithmetic operators
- Aggregates include
MIN
,MAX
,SUM
, andAVG
SELECT COUNT(yCard) AS NumYellowCards
FROM player
WHERE HS > 999 AND yCard="yes";
/* */
SELECT AVG(HS) AS AverageHSSize,
MAX(HS) AS MaxHS,
MIN(HS) AS MinHS
FROM player ;
-
Sub Queries
- Since the result of any query is itself a relation, then it can be the subject of another query
SELECT DISTINCT pName FROM player WHERE pID NOT IN (SELECT pID FROM tryout WHERE decision="yes");
-
GROUP BY
andHAVING
-- How many goalie tryouts did each college have
SELECT cName, COUNT(cName)
FROM tryout
WHERE pPos = "goalie" GROUP BY cName;
-- How many goalie tryouts did each college have
SELECT cName, COUNT(cName)
FROM tryout
WHERE pPos = "goalie"
GROUP BY cName
HAVING COUNT(cName) < 2;
GROUP BY
shows frequency distributions created byGROUP BY
clause withinSELECT
statementHAVING
is extension ofGROUP BY
feature- Applied to output of
GROUP BY
operation - Used in conjunction with
GROUP BY
clause in second SQL command set - Similar to
WHERE
clause inSELECT
statement
- Applied to output of
Window Functions
Aggregate (e.g., SUM
, AVG
) and analytical (e.g., RANK
) functions can also be limited to just some of the rows of a table. A Window function in SQL allows us to perform these *calculations* across a specific set of rows related to the current row. They’re called “window” functions because they can limit calculations to be made within the set of data defined by the window function.
Suppose you have a table of real estate sales in a county: county_sales
+---------+-------+------------+
| town | month | sale_price |
+---------+-------+------------+
| Kenner | 3 | 150000 |
| LaPlace | 4 | 198000 |
| Kenner | 4 | 208000 |
| Gretna | 3 | 195000 |
| Kenner | 3 | 185000 |
| Spring | 5 | 145000 |
| LaPlace | 5 | 130000 |
| LaPlace | 4 | 188000 |
| Kenner | 4 | 188000 |
| Gretna | 3 | 218500 |
| Kenner | 3 | 180000 |
+---------+-------+------------+
You can use a window function to calculate the cumulative sum of the monthly home sales and places it in a new column Total Sales by Town
:
SELECT town, month, sale_price,
AVG(sale_price) OVER( PARTITION BY town )
AS average_town_sales
FROM county_sales
ORDER BY TOWN;
Resulting in:
+---------+-------+------------+--------------------+
| town | month | sale_price | average_town_sales |
+---------+-------+------------+--------------------+
| Gretna | 3 | 195000 | 206750.0000 |
| Gretna | 3 | 218500 | 206750.0000 |
| Kenner | 3 | 150000 | 182200.0000 |
| Kenner | 4 | 208000 | 182200.0000 |
| Kenner | 3 | 185000 | 182200.0000 |
| Kenner | 4 | 188000 | 182200.0000 |
| Kenner | 3 | 180000 | 182200.0000 |
| LaPlace | 4 | 198000 | 172000.0000 |
| LaPlace | 5 | 130000 | 172000.0000 |
| LaPlace | 4 | 188000 | 172000.0000 |
| Spring | 5 | 145000 | 145000.0000 |
+---------+-------+------------+--------------------+
Those zeros after the decimal place are annoying, so let’s fix them:
SELECT town, month, sale_price,
ROUND((AVG(sale_price) OVER( PARTITION BY town )),0) AS total_town_sales
FROM county_sales
ORDER BY TOWN;
resulting in
+---------+-------+------------+------------------+
| town | month | sale_price | total_town_sales |
+---------+-------+------------+------------------+
| Gretna | 3 | 195000 | 206750 |
| Gretna | 3 | 218500 | 206750 |
| Kenner | 3 | 150000 | 182200 |
| Kenner | 4 | 208000 | 182200 |
| Kenner | 3 | 185000 | 182200 |
| Kenner | 4 | 188000 | 182200 |
| Kenner | 3 | 180000 | 182200 |
| LaPlace | 4 | 198000 | 172000 |
| LaPlace | 5 | 130000 | 172000 |
| LaPlace | 4 | 188000 | 172000 |
| Spring | 5 | 145000 | 145000 |
+---------+-------+------------+------------------+
Here’s another example of Window fuctions from Julia Evans of WizardZines.com. The LAG( ) function is a window function that allows you to access data from a previous row in a result set from the current row without using a self-join.
mysql> select * from baby_log;
produces
+---------+------+
| event | hour |
+---------+------+
| feeding | 1 |
| cough | 1 |
| diaper | 3 |
| feeding | 4 |
| diaper | 5 |
| diaper | 5 |
| feeding | 7 |
| cough | 7 |
+---------+------+
8 rows in set (0.05 sec)
Then
SELECT event, hour,
hour - LAG(hour)
OVER(
PARTITION BY event
ORDER BY hour ASC)
AS time_since_last
FROM baby_log
WHERE event IN ('feeding', 'diaper')
ORDER BY hour ASC;
produces
+---------+------+-----------------+
| event | hour | time_since_last |
+---------+------+-----------------+
| feeding | 1 | NULL |
| diaper | 3 | NULL |
| feeding | 4 | 3 |
| diaper | 5 | 2 |
| diaper | 5 | 0 |
| feeding | 7 | 3 |
+---------+------+-----------------+
6 rows in set (0.08 sec)
EXISTS operator
This operator is used to test for the existence of one or more records found by a subquery.
If we use the Northwind
example database, the following query returns TRUE and lists the suppliers with a product price equal to 22:
mysql> SELECT CompanyName
FROM Supplier
WHERE EXISTS (SELECT ProductName
FROM Product
WHERE Product.SupplierID = Supplier.supplierID
AND UnitPrice = 22);
+----------------+
| CompanyName |
+----------------+
| Supplier VHQZD |
+----------------+
1 row in set (0.04 sec)
mysql> SELECT CompanyName, supplierID FROM Supplier WHERE CompanyName = "Supplier VHQZD";
+----------------+------------+
| CompanyName | supplierID |
+----------------+------------+
| Supplier VHQZD | 2 |
+----------------+------------+
1 row in set (0.04 sec)
mysql> SELECT ProductName, supplierId, unitPrice FROM Product WHERE UnitPrice = 22;
+---------------+------------+-----------+
| ProductName | supplierId | unitPrice |
+---------------+------------+-----------+
| Product KSBRM | 2 | 22.00 |
+---------------+------------+-----------+
1 row in set (0.04 sec)
mysql> EXPLAIN SELECT CompanyName FROM Supplier WHERE EXISTS (SELECT ProductName FROM Product WHERE Product.SupplierID = Supplier.supplierID AND UnitPrice = 22);
+----+--------------+-------------+------------+--------+---------------+---------+---------+------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+-------------+------------+--------+---------------+---------+---------+------------------------+------+----------+-------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | Supplier | NULL | eq_ref | PRIMARY | PRIMARY | 4 | <subquery2>.SupplierID | 1 | 100.00 | NULL |
| 2 | MATERIALIZED | Product | NULL | ALL | supplierId | NULL | NULL | NULL | 77 | 10.00 | Using where |
+----+--------------+-------------+------------+--------+---------------+---------+---------+------------------------+------+----------+-------------+
3 rows in set, 2 warnings (0.04 sec)
mysql>
SQL Order of operations
Understanding the order in which the parts of a SQL query are executed by a database is key to writing correct, efficient SQL code. SQL executes the parts of a SQL query in a sequence defined by the SQL standard. There may be some minor variations due to the SQL query optimization that takes place in most SQL databases, but those optimizations are guaranteed to produce the same result as the unoptimized version of the SQL query.
The order of operations for SQL is :
- FROM or JOIN
- WHERE
- GROUP BY
- HAVING
- SELECT
- DISTINCT
- ORDER BY
- LIMIT or OFFSET
Here’s an example based on code at the geeksforgeeks website with the order of operations indicated as comments::
SELECT customer_ID, SUM(total_amount) AS "Total" -- 5th
FROM orders -- 1st
WHERE order_date BETWEEN '2022-01-01' AND '2022-03-31' -- 2nd
AND customer_city = 'New York' -- 2nd
GROUP BY customer_id -- 3rd
ORDER BY Total DESC -- 6th
LIMIT 100; -- 7th
Commit
SLIDE 1 Revisit ACID first, then explain.
Most of the time you can assume that MySQL will automatically COMMIT
changes
due to INSERT
, DELETE
, UPDATE
, CREATE TABLE
, etc. commands. Generally
don’t worry about it
Later when we’re talking about stored procedures in MySQL, we will revisit
COMMIT
and its counterpart ROLLBACK
.
Changing a table’s data with UPDATE
We need a bigger table to illustrate UPDATE
. Suppose we have this relation:
/* update one tuple */
UPDATE instructor
SET salary = 78000
WHERE ID = 45565;
/* update 2 attributes in one tuple */
UPDATE instructor
SET salary = 90000,
dept_name = 'Finance'
WHERE ID = 45565;
/* update multiple tuples */
UPDATE instructor
SET salary = salary * 1.05
WHERE dept_name = 'Comp. Sci.';
Changing a table’s DESIGN
ALTER TABLE tablename
{ ADD | MODIFY } ( columnname datatype [ { ADD | MODIFY } columnname datatype] ) ;
ALTER TABLE tablename
ADD constraint [ ADD constraint ] ;
Primary & Foreign Keys
We know keys have to be non-NULL and UNIQUE. Just add PRIMARY KEY (attr)
or
FOREIGN KEY (attr) REFERENCES relation(attr)
at end of CREATE
.
CREATE TABLE Tryout2
( pID INT REFERENCES Player(pid) ON DELETE SET NULL,
cName VARCHAR(20) REFERENCES College(cName) ON UPDATE CASCADE,
pPos VARCHAR(8),
decision VARCHAR(3)
);
- See the SQL source in
Resources/Examples/SQL/sql_files/soccer3.sql
- Then try adding pID as a PRIMARY KEY to player and recreate and initialize the tables… should get duplicate key error on pID=40004 .
- Delete the Referencing key first, and then it works.