Contents

SQL

Role of SQL

SQL as a query language

  • SQL is the standard query language for retrieving data from relational databases
  • SQL is a standard and is supported by virtually all relational databases in the same manner.
  • Queries built for one database generally run without modification against another database.
  • Many vendors provide extensions to SQL for analytics and special data types.

Terminology

Term Definition Alternate Terms
Entity A concept, event, role, place, or physical object that must be tracked,e.g., Customer. Class, Concept, Table
Attribute A property of an entity,e.g., email. Property, Variable, Feature, Column, Dimension
Table A database structure of columns and rows in which an entities instances are stored and held in either memory or on a storage device. Record, Relation, Entity
Column An attribute of an entity stored in a table. Attribute, Feature, Variable
Row An instance of an entity stored in a table. Observation, Record, Instance
Primary Key One or more columns whose values uniquely identify a row in a table. PK, Unique Identifier
Foreign Key A reference to a row in another table to which a row is related or linked. FK, Link, Reference, Pointer
Result Set A temporary table that holds the result of a SQL query. RS
Query A SQL SELECT statement specifying which rows should be retrieved from the database.
Entity-Relationship Diagram A visualization of a relational data model. ERD, E-R Diagram, Logical Model
Crow’s Foot Notation The colloquial term for E-R Diagrams in the Information Engineering (IE) IE
IE The name of a common notation for E-R Diagrams using specific symbols for entities, keys, and relationships. Crow’s Feet, ERD
DDL The commands in SQL for defining a relational database,i.e., the tables, keys, and attributes along with their constraints. Data Definition Language
DML The SQL commands for manipulating data in the database, including SELECT, UPDATE, DELETE, and INSERT. Data Manipulation Language
SQL DDL and DML statements for generating, manipulating, and using relational databases.
Embedded SQL SQL that is called from within a host program written in a programming language such as R, Python, Java, C++, C#, etc.
Database Server A program running on a computer that manages the physical storage devices that contain the tables. DBMS, Database Engine, Database
Schema The table definitions for a relational database. Relational Schema, Physical Model
Ad Hoc Query A SQL query that is interactively issued from a console and that displays the result immediately in table form. Interactive Query, Manual Query
Database A collection of structured data in records and organized in a way that makes the data retrievable

Retrieving Data from a Single Table

Query process

  • Client connects to a specific database on the database server by supplying connection parameters (IP address, username, password)
  • Client stores connection information in a connection object
  • Client issues a query by a sending SQL statement to the database server over the network connection using the connection object
  • Server checks to ensure that connected user has permission to access data and that the SQL statement is syntactically correct
  • Server optimizes query and formulates a query plan based on least cost for time and memory
  • Server carries out the query by retrieving the data from the storage devices and packaging the result into a temporary table, the result set
  • Server sends result set back to the client
  • Client stores the result set in local memory and processes the data
  • Client disconnects from the database by closing the connection to the server and releasing all resources

SELECT

SELECT [columns] FROM [tables]
 WHERE [column conditions]
 GROUP BY [column]
   HAVING [group criteria]
 ORDER BY [columns];

AS - Defining Column Aliases

SELECT name AS FullName, email AS EMail FROM Author;

LIMIT - Limit the Rows in Result Set

SELECT name AS FullName, email AS EMail FROM Author LIMIT 2;

ORDER BY - Specify Order of Rows

SELECT name, email FROM Author
 ORDER BY name;

Literals, Expressions and Calculations

  • Literals are fixed values.
SELECT name, 'En' lang, email FROM Author;
  • Queries can contain literals (a fixed value), expressions (including calculations), and various functions.
SELECT title AS `Film Title`, length / 60 AS HRS FROM Film;

Built-in Functions

ABS(m) MOD(m, n) POWER(m, n)
ROUND(m, [n]) TRUNC(m [, n]) FLOOR(n
CEIL(n) SIGN(n) SQRT(n)
LN(n) EXP(n) SIN(n)
INITCAP(s) LOWER(s) UPPER(s)
CONCAT(s1, s2) LENGTH(s) SUBSTR(s, m [, n])
LTRIM(s [, set])
ADD_MONTHS(d, n) LAST_DAY(d) MONTHS_BETWEEN(d, e)
NEXT_DAY(d, day) SYSDATE ****

DISTINCT - Removing Duplicates

SELECT DISTINCT district FROM Address;

Conditioanl Expression

WHERE Clause

  • The WHERE clause allows filtering of unwanted rows and only return select rows.
  • When data is missing, it has the value NULL – which is no value at all.
  • To ensure the correct evaluation of a logical expression, use parenthesis around the logical operators AND, OR, and NOT.
SELECT * FROM Address
 WHERE district = 'Rio Grande do Sul'
   AND phone IS NULL;

LIKE - Text Matching

  • The LIKE operator is used for partial string matching.
  • % Matches zero or more characters
  • _ Matches exactly one character
SELECT DISTINCT district FROM Address
 WHERE phone LIKE '_55%1212';

BETWEEN - Range

SELECT * FROM Payment
 WHERE amount BETWEEN 5 AND 10;

IN - Match Set

SELECT DISTINCT district FROM Address
 WHERE postal_code NOT IN (45403,29444,26284);
  • There are two existential qualifiers on sets
    • ANY
    • ALL
  • These are often combined with the set membership test operators IN and NOT IN.
SELECT * FROM Courses
 WHERE credits > ANY (4, 2, 3);
SELECT * FROM Courses
 WHERE credits > ALL (1, 2, 3);

UNION - Combining Query Results

  • Unions are combinations of two or more results sets into a single result set.
  • The results sets must have the same number of columns with the same data types.
SELECT * FROM Courses 
 WHERE credits > 3
UNION
SELECT * FROM Courses 
 WHERE dept = 'IE'

Example

SELECT DISTINCT district 
  FROM Address
 WHERE postal_code IN (45403,29444,26284)
   AND (postal_code <> 66789 
        OR postal_code <> 11887)
   AND ((city_id > 10000)
        OR (city_id BETWEEN 500 AND 800))
   AND address LIKE '%000'
   AND city_id IS NOT NULL;

Retrieving Data from Multiple Tables

Join - Match Primary and Foreign Keys

INNER JOIN

  • An INNER JOIN is an equi-join that selects rows that have matching values in both tables.
  • The keyword INNER is optional.
SELECT * 
  FROM Asset 
       INNER JOIN Author
       ON Asset.auid =
          Author.auid;
  • SQL provides alternative syntax JOIN USING if the two tables are joined via the same column names
SELECT * FROM Asset JOIN Author USING (auid);
  • A NATURAL JOIN automatically joins on columns that are named the same.
SELECT * FROM Asset NATURAL JOIN Author;

Left and Right Outer Joins

  • The Left Join or Left Outer Join operation is a special type of join that takes two relations, A and B, and returns the inner join of A and B along with the unmatched rows of A.
  • A is the first relation defined in the FROM clause and is therefore the left relation.
  • The left outer join includes the unmatched rows of the left relation along with the matched columns in the result.
  • A Right Join is analogous except that it takes the unmatched rows of the right relational B.

Full Outer Joins

  • A full outer join takes two relations, A and B, and returns the inner join of A and B along with the unmatched rows of A and B.
  • Essentially, a full outer join is the distinct union of a left and right outer join.

Aggregation

Common Aggregation Functions

Function Description
MIN smallest value in a column
MAX largest value in a column
SUM sum of all values in a column
AVG average (mean) value
MEDIAN median value
COUNT(*) number of rows in a table
COUNT(col) number of non-NULL values in a column
STDEV standard deviation
VARIANCE variance (square of standard deviation)

GROUP BY

  • When aggregation functions are used in a query that includes a GROUP BY clause then the functions are applied to each group rather than a column.
SELECT a.type AS Type,
       COUNT(*) AS n,
       AVG(time) AS Average, 
       SUM(time) AS Total
  FROM Asset a
 GROUP BY a.type;

Grouping

  • Generating Single-Column Groups
SELECT actor_id, COUNT(*)
  FROM film_actor
 GROUP BY actor_id;
  • Grouping with Aggregate Functions
  • Groups can be restricted (filtered) through the HAVING clause
SELECT actor_id, COUNT(*) AS n
  FROM film_actor
 GROUP BY actor_id
HAVING n > 20;

Views

CREATE VIEW

CREATE VIEW AuthorNM 
 (fullname, email)
 AS 
 SELECT name, 
        substr(email,1,2) || '****' || substr(email,-5) 
   FROM Author;

Common Uses Cases for Views

  • Define a view that contains select rows and columns and provides authorization for some users to access the view but not the underlying tables that define the view.
  • Aggregate data from a complex query in a view to simplify common data access.
  • Create views as alternatives to commonly used subqueries to make queries simpler and to provide a degree of reusability.