Role of SQL
SQL as a query language
- SQL is the standard query language for retrieving data from relational databases
- SQL is a standard and is supported by virtually all relational databases in the same manner.
- Queries built for one database generally run without modification against another database.
- Many vendors provide extensions to SQL for analytics and special data types.
Terminology
Term |
Definition |
Alternate Terms |
Entity |
A concept, event, role, place, or physical object that must be tracked,e.g., Customer. |
Class, Concept, Table |
Attribute |
A property of an entity,e.g., email. |
Property, Variable, Feature, Column, Dimension |
Table |
A database structure of columns and rows in which an entities instances are stored and held in either memory or on a storage device. |
Record, Relation, Entity |
Column |
An attribute of an entity stored in a table. |
Attribute, Feature, Variable |
Row |
An instance of an entity stored in a table. |
Observation, Record, Instance |
Primary Key |
One or more columns whose values uniquely identify a row in a table. |
PK, Unique Identifier |
Foreign Key |
A reference to a row in another table to which a row is related or linked. |
FK, Link, Reference, Pointer |
Result Set |
A temporary table that holds the result of a SQL query. |
RS |
Query |
A SQL SELECT statement specifying which rows should be retrieved from the database. |
|
Entity-Relationship Diagram |
A visualization of a relational data model. |
ERD, E-R Diagram, Logical Model |
Crow’s Foot Notation |
The colloquial term for E-R Diagrams in the Information Engineering (IE) |
IE |
IE |
The name of a common notation for E-R Diagrams using specific symbols for entities, keys, and relationships. |
Crow’s Feet, ERD |
DDL |
The commands in SQL for defining a relational database,i.e., the tables, keys, and attributes along with their constraints. |
Data Definition Language |
DML |
The SQL commands for manipulating data in the database, including SELECT, UPDATE, DELETE, and INSERT. |
Data Manipulation Language |
SQL |
DDL and DML statements for generating, manipulating, and using relational databases. |
|
Embedded SQL |
SQL that is called from within a host program written in a programming language such as R, Python, Java, C++, C#, etc. |
|
Database Server |
A program running on a computer that manages the physical storage devices that contain the tables. |
DBMS, Database Engine, Database |
Schema |
The table definitions for a relational database. |
Relational Schema, Physical Model |
Ad Hoc Query |
A SQL query that is interactively issued from a console and that displays the result immediately in table form. |
Interactive Query, Manual Query |
Database |
A collection of structured data in records and organized in a way that makes the data retrievable |
|
Retrieving Data from a Single Table
Query process
- Client connects to a specific database on the database server by supplying connection parameters (IP address, username, password)
- Client stores connection information in a connection object
- Client issues a query by a sending SQL statement to the database server over the network connection using the connection object
- Server checks to ensure that connected user has permission to access data and that the SQL statement is syntactically correct
- Server optimizes query and formulates a query plan based on least cost for time and memory
- Server carries out the query by retrieving the data from the storage devices and packaging the result into a temporary table, the result set
- Server sends result set back to the client
- Client stores the result set in local memory and processes the data
- Client disconnects from the database by closing the connection to the server and releasing all resources
SELECT
SELECT [columns] FROM [tables]
WHERE [column conditions]
GROUP BY [column]
HAVING [group criteria]
ORDER BY [columns];
AS - Defining Column Aliases
SELECT name AS FullName, email AS EMail FROM Author;
LIMIT - Limit the Rows in Result Set
SELECT name AS FullName, email AS EMail FROM Author LIMIT 2;
ORDER BY - Specify Order of Rows
SELECT name, email FROM Author
ORDER BY name;
Literals, Expressions and Calculations
- Literals are fixed values.
SELECT name, 'En' lang, email FROM Author;
- Queries can contain literals (a fixed value), expressions (including calculations), and various functions.
SELECT title AS `Film Title`, length / 60 AS HRS FROM Film;
Built-in Functions
ABS(m) |
MOD(m, n) |
POWER(m, n) |
ROUND(m, [n]) |
TRUNC(m [, n]) |
FLOOR(n |
CEIL(n) |
SIGN(n) |
SQRT(n) |
LN(n) |
EXP(n) |
SIN(n) |
INITCAP(s) |
LOWER(s) |
UPPER(s) |
CONCAT(s1, s2) |
LENGTH(s) |
SUBSTR(s, m [, n]) |
LTRIM(s [, set]) |
|
|
ADD_MONTHS(d, n) |
LAST_DAY(d) |
MONTHS_BETWEEN(d, e) |
NEXT_DAY(d, day) |
SYSDATE |
**** |
DISTINCT - Removing Duplicates
SELECT DISTINCT district FROM Address;
Conditioanl Expression
WHERE Clause
- The
WHERE
clause allows filtering of unwanted rows and only return select rows.
- When data is missing, it has the value
NULL
– which is no value at all.
- To ensure the correct evaluation of a logical expression, use parenthesis around the logical operators AND, OR, and NOT.
SELECT * FROM Address
WHERE district = 'Rio Grande do Sul'
AND phone IS NULL;
LIKE - Text Matching
- The
LIKE
operator is used for partial string matching.
%
Matches zero or more characters
_
Matches exactly one character
SELECT DISTINCT district FROM Address
WHERE phone LIKE '_55%1212';
BETWEEN - Range
SELECT * FROM Payment
WHERE amount BETWEEN 5 AND 10;
IN - Match Set
SELECT DISTINCT district FROM Address
WHERE postal_code NOT IN (45403,29444,26284);
- There are two existential qualifiers on sets
- These are often combined with the set membership test operators
IN
and NOT IN
.
SELECT * FROM Courses
WHERE credits > ANY (4, 2, 3);
SELECT * FROM Courses
WHERE credits > ALL (1, 2, 3);
UNION - Combining Query Results
- Unions are combinations of two or more results sets into a single result set.
- The results sets must have the same number of columns with the same data types.
SELECT * FROM Courses
WHERE credits > 3
UNION
SELECT * FROM Courses
WHERE dept = 'IE'
Example
SELECT DISTINCT district
FROM Address
WHERE postal_code IN (45403,29444,26284)
AND (postal_code <> 66789
OR postal_code <> 11887)
AND ((city_id > 10000)
OR (city_id BETWEEN 500 AND 800))
AND address LIKE '%000'
AND city_id IS NOT NULL;
Retrieving Data from Multiple Tables
Join - Match Primary and Foreign Keys
INNER JOIN
- An
INNER JOIN
is an equi-join that selects rows that have matching values in both tables.
- The keyword
INNER
is optional.
SELECT *
FROM Asset
INNER JOIN Author
ON Asset.auid =
Author.auid;
- SQL provides alternative syntax
JOIN USING
if the two tables are joined via the same column names
SELECT * FROM Asset JOIN Author USING (auid);
- A
NATURAL JOIN
automatically joins on columns that are named the same.
SELECT * FROM Asset NATURAL JOIN Author;
Left and Right Outer Joins
- The Left Join or Left Outer Join operation is a special type of join that takes two relations, A and B, and returns the inner join of A and B along with the unmatched rows of A.
- A is the first relation defined in the FROM clause and is therefore the left relation.
- The left outer join includes the unmatched rows of the left relation along with the matched columns in the result.
- A Right Join is analogous except that it takes the unmatched rows of the right relational B.
Full Outer Joins
- A full outer join takes two relations, A and B, and returns the inner join of A and B along with the unmatched rows of A and B.
- Essentially, a full outer join is the distinct union of a left and right outer join.
Aggregation
Common Aggregation Functions
Function |
Description |
MIN |
smallest value in a column |
MAX |
largest value in a column |
SUM |
sum of all values in a column |
AVG |
average (mean) value |
MEDIAN |
median value |
COUNT(*) |
number of rows in a table |
COUNT(col) |
number of non-NULL values in a column |
STDEV |
standard deviation |
VARIANCE |
variance (square of standard deviation) |
GROUP BY
- When aggregation functions are used in a query that includes a
GROUP BY
clause then the functions are applied to each group rather than a column.
SELECT a.type AS Type,
COUNT(*) AS n,
AVG(time) AS Average,
SUM(time) AS Total
FROM Asset a
GROUP BY a.type;
Grouping
- Generating Single-Column Groups
SELECT actor_id, COUNT(*)
FROM film_actor
GROUP BY actor_id;
- Grouping with Aggregate Functions
- Groups can be restricted (filtered) through the
HAVING
clause
SELECT actor_id, COUNT(*) AS n
FROM film_actor
GROUP BY actor_id
HAVING n > 20;
Views
CREATE VIEW
CREATE VIEW AuthorNM
(fullname, email)
AS
SELECT name,
substr(email,1,2) || '****' || substr(email,-5)
FROM Author;
Common Uses Cases for Views
- Define a view that contains select rows and columns and provides authorization for some users to access the view but not the underlying tables that define the view.
- Aggregate data from a complex query in a view to simplify common data access.
- Create views as alternatives to commonly used subqueries to make queries simpler and to provide a degree of reusability.