SQL

For the exercises in this lecture, download data-examples.zip and upload to c9.io. (You do not need to unzip it.)

SQL (Structured Query Language) is one of the most popular relational database languages and provides the ability to store, retrieve, update, delete, and search the contents of a database. There are several different implementations of this database including SQL, MS SQL, MySQL, and SQLite. Although there are differences between these implementations, they share a common syntax and functionality. Because it affords a particularly simple setup, we will use SQLite.

A relational database such as SQL is a collection of data organized as a set of well defined “tables” from which data can be accessed in many different ways. Although we will only describe the use of an SQL relational database, you should be aware of the growing popularity of non-relational or non-SQL databases. These databases do not use tables and are particularly popular with companies like Facebook, Google, and Amazon that have to manage massive databases.

Tables

Tables are the building blocks of all relational databases. A database consists of one or more tables, each of which stores data. Each table consists of rows and columns. Each table column (also called a field) has a unique name and has a pre-defined data type (integer, float, string, etc.). For example, in this table, there are four columns: firstName, lastName, email, and phone, each of type string.

firstName lastName email phone
Devin Balkcom devin.balkcom@dartmouth.edu 603-646-0272
Hany Farid hany.farid@dartmouth.edu 603-646-2761

The table columns describe the data types and the table rows contain the actual data for the columns. There are two rows in the above table. Once a table has been populated with rows, it can be searched based on any of the columns. For example, we can search the above table for lastName=Balkcom. Such a search will return the table row(s) whose corresponding column is Balkcom. A search can be further refined by searching on two or more columns, (e.g., lastName=Balkcom and firstName=Devin).

Getting Started

Let’s start by creating a database and a single table within that database. To create a database named db, enter the following at the c9 prompt:

sqlite3 db

Your prompt should now be sqlite>, indicating that you are running SQLite (version 3). Let’s create our first table by entering the following line of code at the SQLite prompt:

CREATE TABLE professors (firstName text, lastName text, email text, phone text);

This command creates a table called professors containing four columns with names firstName, lastName, email, and phone. Each of these fields is of type text (more on types below). Notice that like Javascript and PHP, commands in SQLite end with a semicolon. Notice that unlike Javascript, the syntax is different for making a call to a built-in SQLite function. Here, the function name is effectively “create table” and the parameters are the name of the table “professors” and the column names and types enumerated between the parentheses. Despite the syntactic differences, you can think of this, and most SQLite commands, as function calls.

At this point, we have defined a table, but we have not populated it with any data. Let’s add a single row to the table:

INSERT INTO professors VALUES ('Devin', 'Balkcom', 'devin.balkcom@dartmouth.edu', '603-646-0272');

We can verify that this data was insert properly by selecting all of the rows from the table:

SELECT * FROM professors;

You should see the following printed on your screen (the row of data that we inserted into the table):

Devin|Balkcom|devin.balkcom@dartmouth.edu|603-646-0272

The command SELECT is used to search the database and * is a “wildcard” character that asks for all rows in the table professors. We’ll discuss this command below in more detail.

SQLite is not case sensitive, so the command CREATE and create will each work. By convention, the SQLite commands are written out in all capital letters because it helps to distinguish between the command and the user-supplied data.

If you are like me, you will occasionally forget to add the semi-colon at the end of the SQLite command. When this happens, you will see something like this:

sqlite> SELECT * FROM professors
   ...>

the ...> is a sign that your command is incomplete. Just type a “;” on this line, and your command will execute.

Exercise:
Add a second row to the table professors that contains Hany’s name and contact information. Display the rows of the table to make sure that the row was added correctly. Your output should look like (the solution is provided at the end of this page):

Devin|Balkcom|devin.balkcom@dartmouth.edu|603-646-0272

Hany|Farid|hany.farid@dartmouth.edu|603-646-2761

.commands

At the SQLite prompt, type .help to see a list of basic commands that can be evaluated. For example .table will return a list of all tables in the specified database. The command .open will open a database, and .quit will exit SQLite. There is no .save command because all changes to a table are automatically saved at the time the change is made (this also means that there is no undo or “quit without saving”).

Data Types

Each value stored in an SQLite database has one of the following storage classes:

You may recall that Javascript has several different data types including integer, floats, arrays, etc. A storage class is more general than this concept of a datatype. The INTEGER storage class, for example, includes 6 different integer datatypes of different lengths. This makes a difference to how information is stored on the computer disk, but not to how data is handled by you the programmer.

What is important, is that you should specify the storage class of each column of your table so that it can be stored properly and efficiently. In some cases SQLite will automatically convert data types if you try, for example, to insert an integer into a column defined as a real. In other cases, however, the conversion won’t be possible and the insertion will fail.

Primary Key

In addition to defining the data type of each column, we often define one of the columns as being the primary key. A primary key is a column in a table which uniquely identifies each row the table. This primary key must be unique (no two rows can have the same key) and a table can have only one primary key. We don’t have to define a primary key (I didn’t in the professors table defined above), but if there is no primary key, then you cannot have two rows with the same values. This may not be a problem in our professors table since two of the columns consist of unique email and phone, but in other examples, this could be a problem (e.g., a table consisting of weight, height, gender of the US population will certainly have duplicate rows). As a general rule, therefore, you should define a primary key.

When creating a primary key, you can either manually take on the responsibility of ensuring that the primary key is unique, or you can tell SQLite to automatically assign a unique number. For example, this command creates a table with a primary key that is a positive integer and that is automatically set at the time that you create a row.

CREATE TABLE professors2 (id integer PRIMARY KEY AUTOINCREMENT, firstName text, lastName text, email text, phone text);

To insert a row, we have to slightly modify our previous call to insert by adding NULL into the first position (NULL corresponds to no value which is replaced by SQLite’s auto-increment when the row is added to the table).

INSERT INTO professors2 VALUES (NULL, 'Devin', 'Balkcom', 'devin.balkcom@dartmouth.edu', '603-646-0272');

After adding a second row and the selecting all rows, we will see the following (note the addition of the unique integer key 1 and 2):

1|Devin|Balkcom|devin.balkcom@dartmouth.edu|603-646-0272

2|Hany|Farid|hany.farid@dartmouth.edu|603-646-2761

Select

We have built for you a table called world that contains four columns: country name, 3-letter country abbreviation, country GDP, and country population. The names of these columns are country, abbrv, gdp, and population. This table consists of data collected in 2014 from 149 countires. This table is in the database also named world so to get started using this table enter sqlite3 world at the c9 prompt.

The command SELECT is used in many different ways to access the contents of a table. Recall that to see the contents of the entire table, we execute this command:

SELECT * FROM world;

We can narrow this a bit by replacing the wild card character (*) with the name of one or more columns. For example to see all of the country names, we would execute this command:

SELECT country FROM world;

And to see all country names and country abbreviations:

SELECT country, abbrv FROM world;

In all of these examples, we have used the SELECT command to select one or more of the columns, but we haven’t shown how to refine the search depending on the data in the rows. Let’s do that now.

Where

Let’s look at different ways that we can narrow and refine our searches. Perhaps one of the simplest searches is to retreive one column of data from a specified row in the table. This is done as follows:

SELECT <column> FROM <table> WHERE <condition>;

Notice that this is similar to the above SELECT commands with only the addition of a WHERE <condition>. If we want to retrieve the population of France, for example, the column is population, the table is world, and the condition is country = "France" yielding the following command:

SELECT population FROM world WHERE country = "France";

Run this command at the sqlite> prompt – the result will be 66206930, the population of France in 2014. We can also select more than one column to be returned to us. For example, this command will return France’s population and gdp:

SELECT population, gdp FROM world WHERE country = "France";

For numeric columns such as the population and gdp, we can search on an equality, or more useful, on a range of values. For example, this command will return the country name and gdp for all countries with a population under 1/2 million:

SELECT country, gdp FROM world WHERE population < 500000;

We can further narrow the search on a column by, for example, selecting all countries with population between two values:

SELECT country, gdp FROM world WHERE population BETWEEN 500000 AND 1000000;

In this example, we’ve replaced population < 500000 with the BETWEEN 500000 AND 1000000. In Javascript, this command would have looked like (population >= 50000 && population <= 1000000). Unlike Javascript, these SQLite commands have a more natural language syntax. Conceptually, however, we are performing the same type of logical operations.

Muliple search criteria can be combined using logical operators AND/OR. This search, for example, returns all countries with a population less than 1/2 million and with a gdp greater than 10 million.

SELECT country FROM world WHERE population < 500000 AND gdp > 10000000;

We can also use the value of columns as part of the selection criteria. This search, for example,

SELECT country FROM world WHERE population > gdp/1000;

will find all countries with a gdp to population ratio that is less than 1000.

We can even further refine searches by nesting one select within another. For example, let’s say that we wanted to extract the names of all of the countries whose population is greater than that of France’s. We could do this by first extracting the population (66206930) of France as we did above and then evaluating the following search

SELECT country FROM world WHERE population > 66206930;

Or, we can combine these two searches into one as follows.

SELECT country FROM world WHERE population > (SELECT population FROM world WHERE country='France');

where the expression on the right hand side of the > sign is itself a query which returns the population of France. This syntax should be familiar from Javascript code that you have seen in which the return value of a function is used within an expression.

Exercise:
Write an SQLite command that searches the table world for all countries with a gdp between 10 and 100 million. Your search should return the matching countries’ 3-letter abbreviation. The solution is provided at the end of this page.

Although we won’t go into any detail, you should know that it is possible to not only search within a single table, as we have been doing, but also combine searches across multiple tables searching on and combining columns from multiple tables (if you’d like to learn more, you can read up on the JOIN feature).

Aggregate

In addition to searching a table for rows that satisfy specific criteria, you can also perform calculations on the table columns. The basic syntax for this is:

SELECT <command> FROM <table>;

The “command” usually takes the form of FUNCTION(COLUMN). For example, the function SUM adds up all of the values in the specified column. This command, for example, returns the sum of the populations for each row in the world table.

SELECT SUM(population) FROM world;

Similarly, we can determine the maximum (MAX) or minimum (MIN) value in a column:

SELECT MAX(gdp) FROM world;

SELECT MIN(gdp) FROM world;

The command COUNT simple counts the number of rows so this command will return the number of rows in a table.

SELECT COUNT(*) FROM world;

Like

In the previous searches, we were able to specify absolute values of a string (country = "France") or ranges of numeric values (population < 500000). We are afforded even more flexibility when searching strings with the use of the LIKE modifier and the wild card character %. For example, this command will search for all countries that start with the letter “F”:

SELECT country FROM world WHERE country LIKE "F%";

The string “F%” tells SQLite to find all strings that begin with the letter “F” and have any number of characters (including zero) following the letter “F”. The only difference between this command and our previous command using the WHERE modifier is that we replaced the equal-sign in WHERE country = "France" with the LIKE modifier.

The wild card character % can appear anywhere and any number of times within a search string. This command, for example, will search for all country names that have a double-E anywere in its name, including the first two letters or the last two letters.

SELECT country FROM world WHERE country LIKE "%EE%";

Exercise: Find all country names where the 3-letter abbreviation is contained within the country name (e.g., Hungary/HUN, but not Portugal/PRT). Hint, you will need to use string concatenation: in SQLite the string concatenation function is || (e.g., “abc” || “123” yields the string “abc123”). The solution is provided at the end of this page.

Exercise solutions

Add a second row to the table professors that contains Hany’s name and contact information.

INSERT INTO professors VALUES ('Hany', 'Farid', 'hany.farid@dartmouth.edu', '603-646-2761');

SELECT * FROM professors;

Specify a search that searches the table world for all countries with a gdp between 10 and 100 million.

SELECT abbrv FROM world WHERE gdp BETWEEN 10000000 AND 100000000;

Find all country names where the 3-letter abbreviation is contained within the country name.

SELECT country, abbrv FROM world WHERE country LIKE "%" || abbrv || "%";