Introduction to SQL

admin

Graduate students, I suggest you get started on your papers.

SQL Code: demo.sql soccer.sql soccer2.sql

SQL History

The Structured English QUEry Language, or SEQUEL, was developed by Donald Chamberlain and Raymond Boyce as part of the System R project at the IBM San Jose Research Laboratory, first appearing in 1971.
Due to prior trademarks for that name, it was later renamed Structured Query Language (SQL), but still pronounced “SEE-kwul”
ANSI and ISO standards emerged and evolved:
- SQL-86, SQL-89, SQL-92, SQL:1999, SQL:2003, SQL:2008, SQL:2011, SQL:2016, SQL:2019
Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later standards and special proprietary features.
Note: Some examples we discuss or in the text may work on your particular system.
SQL is used for
- Creating databases
- Adding, changing, or deleting database structures
- Adding, modifying, or deleting database records (tuples)
- Querying databases (information retrieval)
Four parts to the SQL definition
- Data Definition language (DDL) - CREATE, ALTER, DROP, …
- Data Manipulation language (DML) - SELECT, INSERT, UPDATE, DELETE, …
- Data Control language (DCL) - GRANT, REVOKE, …
- Transaction Control Language (TCL) - COMMIT, ROLLBACK, …

SQL Overview

We will follow the SQL Style Guide by Simon Holywell (13 pages) - Keywords are ALWAYS in CAPS
- Table and attribute names are not capitalized and avoid oddities such as camelCase. - Tables should have a collective name, like staff instead of employees. - Attribute names should always be singular. - Use underscores (‘_’) where you would normally have a space, such as in last_name. - Use letters, numbers and underscores in names. - One line per attribute definition
- Every statement ends in a ‘;’
```
SELECT ...
  FROM ...
 WHERE ... ;
```
We will use MySQL Version 8.0 .
- MySQL documentation
- MySQL Workbench Documentation

SQL Command types

SQL DDL

SQL Data types
- See MySQL Datatypes

Instructors setup your database

CREATE DATABASE someones_db;
 GRANT ALL ON someones_db.* TO someones IDENTIFIED BY 'qwerty123';
 GRANT ALL ON someones_db.* TO ccpalmer;
 GRANT ALL ON someones_db.* TO kizito;

That way only you and the instructors can USE your database. For example:

SHOW GRANTS FOR Peter;

Once you have a database, you connect to it.

via ssh and terminal mode - typical use

$ ssh babylon7.thayer.dartmouth.edu
~ $ mysql --user=d29265d --password -h cosc061-ad12-db.c.dartmouth.edu cosc_61
mysql> SHOW TABLES;
+-----------------+
| Tables_in class |
+-----------------+
| college         |
| player          |
| tryout          |
| members         |
| . . .           |
+-----------------+
4 rows in set (0.00 sec)
mysql> ^D
$

via MySQLWorkbench
- setup the server, userid, password, and default schema (database) …
- SHOW TABLES;
- Open SQL Script file soccer.sql
  - CREATE TABLE
  - NOT NULL
  - UNIQUE
  - REFERENCES
  - INSERT INTO ... VALUES ...

Gotta be careful though - suppose we had this:

CREATE TABLE college (
     cName   varchar(20) UNIQUE,
     state   varchar(2),
     enr     numeric(5,0)
);

That wouldn’t work for two OSU’s “Oklahoma State Univ” and “Ohio State Univ”.

SELECT

 SELECT A1, A2, ..., An
   FROM R1, R2, ..., Rm
  WHERE condition ;

back to the example … do several SELECTs, including SELECT *

 SELECT *
   FROM tryout;
/* */
 SELECT *
   FROM player
  WHERE pName="David";

DISTINCT eliminates duplicates. By default, SELECT assumes the ALL modified which will return all of the tuples that match. The SELECT DISTINCT version eliminates duplicates. Here, the query returns a subset of the matching tuples and some of them are duplicates.

INSERT INTO player (pID, pName, yCard, HS)
VALUES (70007, 'Rosario', 'yes', 400);
  
INSERT INTO player (pID, pName, yCard, HS)
VALUES (80008, 'Rosario', 'yes', 800);
  
SELECT *
   FROM player;
  
/* defaults to ALL and returns duplicates since
Rosario's tuple matches both conditions */
SELECT *
  FROM player
WHERE pName = "Rosario"
   OR yCard = "yes";
/* DISTINCT says no duplicates! */
SELECT DISTINCT *
  FROM player
WHERE pName = "Rosario"
   OR yCard = "yes";

sort the resulrs with ORDER BY .

Note that NULL values are handled together, sometimes rising to top or sinking to bottom of list.

 SELECT *
   FROM tryout
  WHERE decision = "yes"
  ORDER BY pPos;
  
 SELECT *
   FROM player
  ORDER BY pName ASC, HS DESC;

What is the schema for that table?
```
SHOW CREATE TABLE college;
```
Can I clone a table? Yes!
```
CREATE TABLE college2 LIKE college;
```
and check that it worked with

SHOW CREATE TABLE college2;

Special Operators

LIKE does simplistic string matching: % matches zero or more characters, and _ matches exactly one character.
REGEX does simple regular expression matching:
- ^ matches the beginning of the value being tested
- $ matches the end of the value being tested
- . matches a single character
- [listofcharacters] matches any single character in the bracketed list
- [char1-charN] matches any single character in the range specified
- | provides an “OR” condition between two patterns, matching either one
```
SELECT *
  FROM college;
/* */
SELECT *
  FROM college
 WHERE cName
  LIKE "%SU";     -- gives LSU ASU
/* */
SELECT *
  FROM college
 WHERE cName REGEXP 'U$';    -- gives LSU ASU OU
/* */
SELECT *
  FROM college
 WHERE cName REGEXP '^L';    -- gives LSU
/* */
SELECT *
  FROM college
 WHERE cName REGEXP '^[LO]'; -- gives LSU OU
```
more keywords to narrow selections are BETWEEN, IS NULL, IS NOT NULL, IN:

 SELECT pName, HS
  WHERE HS
BETWEEN 299 AND 999;
/* */
 SELECT DISTINCT pName
   FROM player
  WHERE pName BETWEEN 'AAA' AND 'MMM';
/* */
 SELECT DISTINCT pName
  FROM player
 WHERE pName BETWEEN 'MMN' AND 'ZZZ';
/* */
SELECT cName
  FROM college
 WHERE state in ('TX','LA','OK','AR');

AND, OR, NOT

SELECT *
  FROM player
 WHERE HS < 1000;
/* */
SELECT *
  FROM player
 WHERE HS < 1000
   AND yCard = "yes";
	 /* */
SELECT *
  FROM player
 WHERE HS < 1000
   AND pName NOT LIKE ('G%');
  /* */
SELECT DISTINCT pName
  FROM player
 WHERE pName NOT BETWEEN 'AAA' and 'MMM';

Renaming with AS
- Just make sure it’s a unique new name!

SELECT DISTINCT pName AS city_player
  FROM player
 WHERE HS > 1000;

Arithmetic and Aggregate functions
- The usual arithmetic operators
- Aggregates include MIN, MAX, SUM, and AVG

 SELECT COUNT(yCard) AS NumYellowCards
   FROM player
  WHERE HS > 999 AND yCard="yes";
 /* */
 SELECT AVG(HS) AS AverageHSSize,
        MAX(HS) AS MaxHS,
        MIN(HS) AS MinHS
   FROM player ;

Sub Queries

Since the result of any query is itself a relation, then it can be the subject of another query

 SELECT DISTINCT pName
   FROM player
  WHERE pID NOT IN
 		(SELECT pID
 		   FROM tryout
 		  WHERE decision="yes");

GROUP BY and HAVING

-- How many goalie tryouts did each college have
 SELECT cName, COUNT(cName)
   FROM tryout
  WHERE pPos = "goalie" GROUP BY cName;
  
-- How many goalie tryouts did each college have 
 SELECT cName, COUNT(cName)
   FROM tryout
  WHERE pPos = "goalie"
  GROUP BY cName
 HAVING COUNT(cName) < 2;

GROUP BY shows frequency distributions created by GROUP BY clause within SELECT statement
HAVING is extension of GROUP BY feature
- Applied to output of GROUP BY operation
- Used in conjunction with GROUP BY clause in second SQL command set
- Similar to WHERE clause in SELECT statement

Window Functions

Aggregate (e.g., SUM, AVG) and analytical (e.g., RANK) functions can also be limited to just some of the rows of a table. A Window function in SQL allows us to perform these *calculations* across a specific set of rows related to the current row. They’re called “window” functions because they can limit calculations to be made within the set of data defined by the window function.

Suppose you have a table of real estate sales in a county: county_sales

+---------+-------+------------+
| town    | month | sale_price |
+---------+-------+------------+
| Kenner  |     3 |     150000 |
| LaPlace |     4 |     198000 |
| Kenner  |     4 |     208000 |
| Gretna  |     3 |     195000 |
| Kenner  |     3 |     185000 |
| Spring  |     5 |     145000 |
| LaPlace |     5 |     130000 |
| LaPlace |     4 |     188000 |
| Kenner  |     4 |     188000 |
| Gretna  |     3 |     218500 |
| Kenner  |     3 |     180000 |
+---------+-------+------------+

You can use a window function to calculate the cumulative sum of the monthly home sales and places it in a new column Total Sales by Town :

  SELECT town, month, sale_price, 
         AVG(sale_price) OVER( PARTITION BY town ) 
            AS average_town_sales 
    FROM county_sales
ORDER BY TOWN;

Resulting in:

+---------+-------+------------+--------------------+
| town    | month | sale_price | average_town_sales |
+---------+-------+------------+--------------------+
| Gretna  |     3 |     195000 |        206750.0000 |
| Gretna  |     3 |     218500 |        206750.0000 |
| Kenner  |     3 |     150000 |        182200.0000 |
| Kenner  |     4 |     208000 |        182200.0000 |
| Kenner  |     3 |     185000 |        182200.0000 |
| Kenner  |     4 |     188000 |        182200.0000 |
| Kenner  |     3 |     180000 |        182200.0000 |
| LaPlace |     4 |     198000 |        172000.0000 |
| LaPlace |     5 |     130000 |        172000.0000 |
| LaPlace |     4 |     188000 |        172000.0000 |
| Spring  |     5 |     145000 |        145000.0000 |
+---------+-------+------------+--------------------+

Those zeros after the decimal place are annoying, so let’s fix them:

  SELECT town, month, sale_price, 
         ROUND((AVG(sale_price) OVER( PARTITION BY town )),0) AS total_town_sales 
    FROM county_sales
ORDER BY TOWN;

resulting in

+---------+-------+------------+------------------+
| town    | month | sale_price | total_town_sales |
+---------+-------+------------+------------------+
| Gretna  |     3 |     195000 |           206750 |
| Gretna  |     3 |     218500 |           206750 |
| Kenner  |     3 |     150000 |           182200 |
| Kenner  |     4 |     208000 |           182200 |
| Kenner  |     3 |     185000 |           182200 |
| Kenner  |     4 |     188000 |           182200 |
| Kenner  |     3 |     180000 |           182200 |
| LaPlace |     4 |     198000 |           172000 |
| LaPlace |     5 |     130000 |           172000 |
| LaPlace |     4 |     188000 |           172000 |
| Spring  |     5 |     145000 |           145000 |
+---------+-------+------------+------------------+

Here’s another example of Window fuctions from Julia Evans of WizardZines.com. The LAG( ) function is a window function that allows you to access data from a previous row in a result set from the current row without using a self-join.

mysql> select * from baby_log;

produces

+---------+------+
| event   | hour |
+---------+------+
| feeding |    1 |
| cough   |    1 |
| diaper  |    3 |
| feeding |    4 |
| diaper  |    5 |
| diaper  |    5 |
| feeding |    7 |
| cough   |    7 |
+---------+------+
8 rows in set (0.05 sec)

Then

SELECT event, hour,
       hour - LAG(hour) 
       OVER(
         PARTITION BY event
         ORDER BY hour ASC)
         AS time_since_last
  FROM baby_log
 WHERE event IN ('feeding', 'diaper')
 ORDER BY hour ASC;

produces

+---------+------+-----------------+
| event   | hour | time_since_last |
+---------+------+-----------------+
| feeding |    1 |            NULL |
| diaper  |    3 |            NULL |
| feeding |    4 |               3 |
| diaper  |    5 |               2 |
| diaper  |    5 |               0 |
| feeding |    7 |               3 |
+---------+------+-----------------+
6 rows in set (0.08 sec)

EXISTS operator

This operator is used to test for the existence of one or more records found by a subquery.

If we use the Northwind example database, the following query returns TRUE and lists the suppliers with a product price equal to 22:

mysql> SELECT CompanyName 
         FROM Supplier 
        WHERE EXISTS (SELECT ProductName 
                        FROM Product 
                       WHERE Product.SupplierID = Supplier.supplierID 
                         AND UnitPrice = 22);

+----------------+
| CompanyName    |
+----------------+
| Supplier VHQZD |
+----------------+
1 row in set (0.04 sec)

mysql> SELECT CompanyName, supplierID  FROM Supplier WHERE CompanyName = "Supplier VHQZD";
+----------------+------------+
| CompanyName    | supplierID |
+----------------+------------+
| Supplier VHQZD |          2 |
+----------------+------------+
1 row in set (0.04 sec)

mysql> SELECT ProductName, supplierId, unitPrice  FROM Product WHERE UnitPrice = 22;
+---------------+------------+-----------+
| ProductName   | supplierId | unitPrice |
+---------------+------------+-----------+
| Product KSBRM |          2 |     22.00 |
+---------------+------------+-----------+
1 row in set (0.04 sec)

mysql> EXPLAIN SELECT CompanyName FROM Supplier WHERE EXISTS (SELECT ProductName FROM Product WHERE Product.SupplierID = Supplier.supplierID AND UnitPrice = 22);
+----+--------------+-------------+------------+--------+---------------+---------+---------+------------------------+------+----------+-------------+
| id | select_type  | table       | partitions | type   | possible_keys | key     | key_len | ref                    | rows | filtered | Extra       |
+----+--------------+-------------+------------+--------+---------------+---------+---------+------------------------+------+----------+-------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                   | NULL |   100.00 | Using where |
|  1 | SIMPLE       | Supplier    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | <subquery2>.SupplierID |    1 |   100.00 | NULL        |
|  2 | MATERIALIZED | Product     | NULL       | ALL    | supplierId    | NULL    | NULL    | NULL                   |   77 |    10.00 | Using where |
+----+--------------+-------------+------------+--------+---------------+---------+---------+------------------------+------+----------+-------------+
3 rows in set, 2 warnings (0.04 sec)

mysql>

SQL Order of operations

Understanding the order in which the parts of a SQL query are executed by a database is key to writing correct, efficient SQL code. SQL executes the parts of a SQL query in a sequence defined by the SQL standard. There may be some minor variations due to the SQL query optimization that takes place in most SQL databases, but those optimizations are guaranteed to produce the same result as the unoptimized version of the SQL query.

The order of operations for SQL is :

FROM or JOIN
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY
LIMIT or OFFSET

Here’s an example based on code at the geeksforgeeks website with the order of operations indicated as comments::

SELECT customer_ID, SUM(total_amount) AS "Total"        -- 5th
FROM orders																							-- 1st
WHERE order_date BETWEEN '2022-01-01' AND '2022-03-31'  -- 2nd
AND customer_city = 'New York'                          -- 2nd
GROUP BY customer_id                                    -- 3rd
ORDER BY Total DESC                                     -- 6th
LIMIT 100;                                              -- 7th

Commit

SLIDE 1 Revisit ACID first, then explain.

Most of the time you can assume that MySQL will automatically COMMIT changes due to INSERT, DELETE, UPDATE, CREATE TABLE, etc. commands. Generally don’t worry about it

Later when we’re talking about stored procedures in MySQL, we will revisit COMMIT and its counterpart ROLLBACK.

Changing a table’s data with `UPDATE`

We need a bigger table to illustrate UPDATE. Suppose we have this relation:

instructorRelation

 /* update one tuple */
 UPDATE instructor
    SET salary = 78000
  WHERE ID = 45565;
 /* update 2 attributes in one tuple  */
 UPDATE instructor
    SET salary = 90000,
    	  dept_name = 'Finance'
  WHERE ID = 45565;
 /* update multiple tuples */
 UPDATE instructor
    SET salary = salary * 1.05
  WHERE dept_name = 'Comp. Sci.';

Changing a table’s DESIGN

ALTER TABLE tablename
{ ADD | MODIFY } ( columnname datatype [ { ADD | MODIFY } columnname datatype] ) ;
ALTER TABLE tablename
  ADD constraint [ ADD constraint ] ;

Primary & Foreign Keys

We know keys have to be non-NULL and UNIQUE. Just add PRIMARY KEY (attr) or FOREIGN KEY (attr) REFERENCES relation(attr) at end of CREATE.

CREATE TABLE Tryout2
  ( pID INT REFERENCES Player(pid) ON DELETE SET NULL, 
	  cName VARCHAR(20) REFERENCES College(cName) ON UPDATE CASCADE, 
	  pPos VARCHAR(8),
    decision VARCHAR(3)
  );

See the SQL source in Resources/Examples/SQL/sql_files/soccer3.sql
Then try adding pID as a PRIMARY KEY to player and recreate and initialize the tables… should get duplicate key error on pID=40004 .
Delete the Referencing key first, and then it works.