SQL

Table of Contents

1. Databases

Database management systems (DBMS) manage databases, which consists of tables. Tables are a collection of records, which have rows that have a value for each column. A column is defined by its name and the type of the value that it holds.

2. SQL

The Structured Query Language (SQL) is used to query databases, making it perhaps the most widely used programming language.

2.1. Declarative Programming

SQL is a declarative language, which makes it different from languages such as Python or Scheme. In declarative languages such as SQL and Prolog, the program is a description of the desired result, rather than in imperative languages where the program is a description of the computational processes to get a result. Here, the interpreter's job is to figure out how to run processes in order to generate the desired result.

2.2. SELECT Statements

A SELECT statement transforms one table to another by selecting data. It can select columns from a table:

SELECT * FROM [table];

This query selects all of the columns from the table, and displays all of its rows. The special * column description refers to all columns in the table.

2.2.1. Projection

A projection is defined as taking a vertical subset of rows in a table. In other words, this type of query returns some of the columns in a table:

SELECT [expression] AS [name], [expression] AS [name], ... FROM [table] LIMIT [num];

It always includes a comma-separated list of column descriptions. A column description is an expression, optionally followed by AS which renames the column name to something else. The FROM clause describes a table from which to select rows and columns. The optional LIMIT clause describes the maximum number of rows to select. The special * column description refers to all columns in the table.

2.2.2. Selection

A selection is defined as taking a horizontal subset of rows in a table that satisfy a particular condition. In other words, this type of query returns some of the rows and all of the columns in a table:

SELECT * FROM [table] WHERE [condition] ORDER BY [order]

A subset of the rows is selected using the WHERE clause, which specifies a condition. An ordering over the remaining rows can be declared using an ORDER BY clause. By default this would be in increasing order, but we can also use descending order by using DESC after [order].

Example: Selection and projection

We can combine selection and projection together to perform queries on tables. The following example uses the IMDB dataset:

SELECT time, runtime, genres FROM titles WHERE year = 2025 ORDER BY runtime LIMIT 3;

This selects the columns time, runtime and genres from the table called titles such that the rows satsify the condition year = 2025. The remaining rows are then ordered by runtime in increasing order and then the first three results are shown.

2.2.3. Joining Tables

Oftentimes, you want to create a new table from data in existing tables. To do this tables can be joined to form combinations of data from rows from each of the existing tables:

[select statement] FROM [table] JOIN [table] ON [condition]

This will select data FROM the first table and join it with data from the second table for the rows that satisfy [condition]. For example:

SELECT title, averageRating FROM titles JOIN ratings ON titles.tconst=ratings.tconst;

The syntax also allows us to implicitly use a comma as JOIN and to put all of our conditions in the WHERE clause instead of having a separate ON clause:

SELECT title, averageRating FROM titles, ratings WHERE titles.tconst=ratings.tconst;

We can also join more than one table together by just chaining JOIN clauses together. The AND keyword is also useful to add conditions.

  1. Cross Joins

    Oftentimes, we need a condition to limit the ways we can join two tables. Leaving out this condition results in a cross join, which returns all possible pairs of rows between the two tables.

  2. Self Joins

    A table can be joined to itself to compare one row to another row. To differentiate between them, aliases must be used. For example, to create a table of remakes, we want to see two movies that have the same title in the same table:

    SELECT old.title, old.year AS first, new.year AS second
        FROM titles AS old JOIN titles AS new
        ON old.title=new.title AND old.year < new.year
    

2.2.4. Table Aliases

If two tables share a column name, we have to use dot expressions to disambiguate which column is from which table. This may result in long conditions and dot expressions in a query. To shorten the name of a table, table aliases may be used by introducing an AS clause: [table] AS [alias]. For example:

SELECT title, averageRating FROM titles AS t JOIN ratings AS r ON t.tconst=r.tconst;

2.3. CREATE TABLE Statements

The result of a SELECT statement is displayed to the user, but not stored. A CREATE TABLE statement gives the result a permanent name:

CREATE TABLE [name] AS [select statement];

We can also create tables by specifying the column names and their types:

CREATE TABLE IF NOT EXISTS [table name] (
    [column name] [type] UNIQUE,
    [column name] [type] DEFAULT [value],
    ...
);

Notice that we can add IF NOT EXISTS to only run the query if there is no table with this name already. Adding UNIQUE after the column name and type specifies that each row must have a unique value for this column, and we can also specify the DEFAULT value it takes on if no value is specified.

2.4. DROP TABLE Statements

If you want to delete a table if it exists, use the DROP TABLE statement:

DROP TABLE IF EXISTS [table name];

2.5. Modifying Tables

Tables can be modified by inserting, deleting, or changing rows.

2.5.1. INSERT Statements

An INSERT statement inserts a row into a table:

INSERT INTO [table name] ([column name], ...) VALUES ([value], ...);

We can also insert rows that are returned by a select statement:

INSERT INTO [table name] ([column name], ...) [select statement];

We can specify the specific columns we want to change by writing their names in parantheses.

2.5.2. UPDATE Statements

An UPDATE statement updates entries in certain columns to new values, just for some subset of rows:

UPDATE [table name] SET [column name] = [expr], ... WHERE [condition];

2.5.3. DELETE Statements

We can also delete some or all rows from a table with a DELETE statement:

DELETE FROM [table name] WHERE [condition];

Without a WHERE clause, the query will delete all rows from a table. Note that this is not equivalent to DROP TABLE, as the table will still exist.

2.6. Numerical Expressions

In a SELECT expression, column names evaluate to row values. Therefore, we can perform arithmetic operations on the column names in a SELECT statement. Imagine we had a table for a ski lift, with a column specifying the identifier for the chair, a number for the number of single people, and a number for the number of couple. Then, the following SQL query shows the total people on the ski lift:

SELECT chair, single + 2 * couple AS total FROM lift;

We can also use the functions ABS, ROUND, and NOT to transform values as well.

2.7. String Expressions

String values can be combined to form longer strings using the concatenation operator ||:

SELECT "Hello," || " world";

SQL has basic string manipulation built-in, with the functions SUBSTR and INSTR. The function SUBSTR([str], [start], [len]) takes a string and returns the substring starting from the index [start] and with length [len]. Note that SQL indexes start with one, not zero.

The INSTR([str1], [str2]) function returns the position of the first match of [str2] inside the search string [str1].

Strings can also be searched for a partial match:

SELECT title FROM titles WHERE title LIKE "%dog%"

2.8. Aggregation

An aggregate function in the [columns] clause of a SELECT statement computes a value from a group of rows. For example, the MAX function takes in a series of rows represented by one column, but returns only one data value that is the max value of all the rows. The SUM function sums all of the values. COUNT(*) counts the total amount of rows. Critically, these aggregate functions only operate on one column of a series of rows. It is possible, however, to use multiple aggregate functions to combine data from multiple columns:

SELECT SUM(averageRating * numVotes) / SUM(numVotes) FROM ratings;

This gives the average rating over all the votes in the dataset.

2.8.1. Grouping Rows

Rows in a table can be grouped, and aggregation can be performed on each group:

SELECT [columns] FROM [table] GROUP BY [expression] HAVING [expression]

The number of groups is the number of unique values of an expression. In other words, it groups rows together that have the same value in the GROUP BY clause. The HAVING clause filters the set of groups that are aggregated. Then, we can run our aggregate functions on these groups separately, and then output a new table where each group corresponds to one row.

Example: Movies per decade

We want to create a table with columns decade and movies that contain one row for each decade in which a movie was released and the number of movies that were released in that decade. To do this, we can run the following query:

SELECT year / 10 * 10 AS decade, COUNT(*) AS movies FROM titles GROUP BY year / 10;

Note that SQL performs floor division when dividing an integer by another integer.

3. Connecting from Python

There exists a sqlite3 module for Python that allows us to connect to a sqlite database and run queries. To do this, we first set up a Connection, and then we can use the execute() method to run queries on that connection. The fetchall() method allows us to retrieve the results of the query, and the commit() method commits our changes to the database:

import sqlite3

db = sqlite3.Connection("n.db")
db.execute("CREATE TABLE nums AS SELECT 2 UNION SELECT 3;")
db.execute("INSERT INTO nums VALUES (?), (?), (?);", range(4, 7))
print(db.execute("SELECT * FROM nums;").fetchall())
db.commit()

Notice how execute() allows us to specify holders in the query with ?, which will then get filled in with a sequence computed in the second argument to the method.

Last modified: 2025-11-23 14:30