Security
Database schema before class: database_day8_end_of_class.sql
Slides from class
Code from class:
Database notes
User privileges
Privileges can be granted on a database at a granular level. For example, individual users can be granted Data Manipulation Language (DML) privileges to SELECT, INSERT, UPDATE, and/or DELETE rows in specified database relations. Privileges can also be granted for Data Definition Language (DDL) operations such as creating tables and views as well as other database management functions such as backups/restores. Database administrators can also specify which users can GRANT rights to other users.
SQL injection attacks
Maliciously including unexpected SQL commands into user input. For example an adversary may enter SQL commands into a textbox meant for a user name. These SQL commands may circumvent the intended database logic or give the adversary additional knowledge about the database. Prevent these attacks using prepared statements. Never trust raw user input! The old saw applies: you can never make anything foolproof because fools are so ingenious. Hackers are even more ingenious!
Plain text
Text that is readable by a human (not hashed or encrypted).
Hash function
A deterministic one-way mathematical function that takes some input (plain text or otherwise) and produces a fixed-length digest of the input. The same input always results in the same digest. Given only the digest, it is computationally expensive (we would like to say impossible) for an adversary to determine the original input.
The one-way nature of hash functions is useful for storing passwords. When a user initially creates an account on a system they enter a password in plain text which is then hashed and the hash digest is stored in the database. Later, when the user logs in, they re-enter their password in plain text. The database hashes the plain-text password input and compares the result with the hashed password stored in the database. If there is a match, the user entered the correct password and is granted access, otherwise the user is not granted access. Because the password is stored in hashed form, in the event the database is compromised, the adversary cannot read user's passwords in plain text. Instead the adversary sees only the digest of the hash function, from which they cannot easily compute the plain-text passwords that lead to those digests.
There are many different hash functions available. SHA256, SHA512, and bcrypt are considered appropriate for hashing passwords (in early 2020), but fast hash functions such as SHA1 and MD5 are not considered appropriate for hashing passwords.
Fast hash function
Many hashing algorithms such as MD5 and SHA1 compute a hash quickly and are not suitable for hashing passwords. Because they can be quickly computed, an adversary can perform a dictionary or rainbow table attack (see below) in a "reasonable" amount of time.
Fast hash function are useful in some use cases. For example, they can be used to quickly determine if a document has been altered. When the document is first created, store the hash of the document. Later, if a document is presented as an unaltered copy of the original, take a hash of the new document and compare it's hash with the stored hash. If the hashes match, the document has not been altered (or more precisely, it is *extremely* unlikely to have been altered, there is a remotely possibility there was a hash collision where different document happen to produce the same digest, but this is extremely unlikely with a good hash function).
Slow hash function
Slow hash functions such as bcrypt purposefully take a long time to compute compared with fast hash functions. While normally in computer science we want our algorithms to execute quickly, slow hash functions are desirable for storing passwords. This is because a slow hash function still returns a result in a short amount of time, say after 100 ms, but it is fast enough that a user is unlikely to notice a delay. An adversary attempting a brute force, dictionary, or rainbow attack, however, can only compute a small number of hashes per second (10 hash per second if each hash takes 100 ms vs. possibly millions of hashes per second for fast hash functions), making these attacks impractical.
Hash digest
The result of computing a hash function on plain text input. Often referred to simply as a hash.
Brute force attack
Running all possible combinations of characters through a hash function and looking for matching hashed password entries in a database. The process works as follows: assume passwords are stored in hashed form in a database, first select some combination of characters (say 'aaa') to test as a possible user password, hash those characters, compare the resulting digest with all hashed passwords stored in a database. If any entries match the digest, the characters input into the hash function (e.g., 'aaa') must be the password and the password is considered 'cracked.' This brute-force guessing generally takes a long time to compute because the adversary must try all possible character combinations that could be a user password (with a slow hash function and long passwords, checking all possible character combinations would take thousands of years, even with modern computers using GPUs), but most character combinations will not be ones chosen by a user as their password (e.g., most users do not use a password such as: Lj~a;fDn.m.345-4zA). The advantage of a brute force attack is that it will eventually crack all passwords.
Dictionary attack
Running all words in a dictionary (say of common passwords) through a hash function and looking for matching digest entries in a database. Generally much faster than a brute force attack, although it will only discover passwords that are in the dictionary.
rockyou.txt
A dictionary containing over 14 million passwords chosen by actual users. This file is often used as a dictionary for dictionary attacks.
Rainbow table
A precomputed table storing all possible character combinations (up to a certain length, say 6 characters) and their resulting hash digests. While the computation takes a long time and requires a large amount of storage space, once completed, the text that lead to a hash can be looked up in the table using the hash digest.
Salt
A random string of characters appended (or prepended or both) to a password. A different salt is chosen for each user. Adding the additional random characters defeats dictionary attacks (if the salt is unknown to the adversary, more below) because users will not normally choose passwords with the additional random characters and that entry is unlikely to be in the dictionary. It also generally defeats rainbow table attacks because if the salt is n characters long, the rainbow table must precompute longer password lengths by n, soon becoming computationally intractable. For example if n is 64 characters, and assuming a slow hash function, it would be impractical to compute a rainbow table for passwords that are at least 65 characters long.
The salt is generally stored in the database in plain text. When a user submits a password in plain text, the salt is appended and the password plus salt is hashed -> h(password+salt). The resulting digest is then compared with the salted and hashed password digest stored in the database. If the two values match, the user submitted the proper password. If the salt is also stolen, however, dictionary attacks may succeed, although will be slower to compute (must try adding each salt value in the database to each word in the dictionary). If salt is used, cracking one password will not also crack the passwords of other users who chose the same password.
Pepper
A secret value that is appended to a password plus salt. This value is not stored in the database. In some instances, the paper is a randomly chosen single character not recorded or known by the database. When a user attempts to log in, the database tries appending one character at a time to the submitted password plus salt, then hashes. If the resulting digest matches the stored hashed, salted, and peppered password, the user is granted access. If there are finite number of characters (say 26 lower case letter, 26 upper case letters, plus 10 digits, plus some punctuation for a total of 95 characters) the database tries each of the characters, one at a time, and will eventually find the correct pepper. This test can be done relatively quickly if only one password need be checked (e.g., a legitimate user may have to wait a short amount of time while the database tries up to 95 possible peppers), but slows down adversary testing many passwords (either brute force or dictionary) by a factor of the number of pepper characters (here by a factor of 95).
Hashcat
Software that attempts to discover the plain text that resulted in a hash digest. It can perform brute force or dictionary attacks with sophisticated rules (e.g., trying adding digits to the end of words, or replace e with 3, etc).