Portal | Level: L0: Entry | Topics: SQL, Database Operations | Domain: DevOps & Tooling

SQL Fundamentals - Primer¶

Why This Matters¶

SQL is the language you use to interrogate databases, and as a DevOps or SRE engineer you will write SQL constantly — querying metrics databases, digging through application logs stored in PostgreSQL or ClickHouse, investigating incidents via audit tables, and pulling capacity data from CMDB systems. You do not need to be a DBA, but you need to be fluent enough to answer questions quickly under pressure.

SQL is declarative: you describe the result you want, not the steps to get it. This is powerful but initially disorienting because the order you write clauses is not the order the database processes them. Understanding the logical evaluation order eliminates most beginner confusion and makes complex queries much easier to reason about.

If you can write a confident SELECT with joins, filtering, grouping, and aggregation, you can answer almost any ad-hoc question thrown at you during an incident or capacity review.

Core Concepts¶

1. Logical Evaluation Order¶

The most important mental model for SQL. The database processes clauses in this order, regardless of how you write them:

1. FROM        — choose source tables and joins
2. WHERE       — filter individual rows
3. GROUP BY    — collapse rows into groups
4. HAVING      — filter groups
5. SELECT      — choose columns and expressions
6. ORDER BY    — sort results
7. LIMIT       — truncate output

This explains why you cannot use a column alias from SELECT in your WHERE clause — WHERE runs before SELECT.

2. A Complete Example¶

SELECT
    owner,
    COUNT(*) AS cat_count
FROM cats
WHERE owner != 3
GROUP BY owner
HAVING COUNT(*) = 2
ORDER BY owner DESC
LIMIT 10;

Step-by-step evaluation:

Step	Clause	What happens
1	FROM cats	Start with all rows in `cats`
2	WHERE owner != 3	Remove rows where owner is 3
3	GROUP BY owner	Group remaining rows by owner
4	HAVING COUNT(*) = 2	Keep only groups with exactly 2 cats
5	SELECT owner, COUNT(*)	Output the owner and count columns
6	ORDER BY owner DESC	Sort by owner descending
7	LIMIT 10	Return at most 10 rows

3. Aggregation Functions¶

Aggregation functions collapse multiple rows into a single value per group.

Function	Purpose	NULL behavior
COUNT(*)	Count all rows in group	Counts rows with NULLs
COUNT(col)	Count non-NULL values	Ignores NULLs
SUM(col)	Sum values	Ignores NULLs
AVG(col)	Average values	Ignores NULLs
MIN(col)	Smallest value	Ignores NULLs
MAX(col)	Largest value	Ignores NULLs

The difference between COUNT(*) and COUNT(col) trips people up regularly. COUNT(*) counts rows. COUNT(col) counts non-NULL values in that column. If a column has NULLs, these return different numbers.

4. WHERE vs HAVING¶

-- WHERE filters rows BEFORE grouping
SELECT status, COUNT(*)
FROM requests
WHERE method = 'POST'
GROUP BY status;

-- HAVING filters groups AFTER aggregation
SELECT status, COUNT(*) AS cnt
FROM requests
GROUP BY status
HAVING COUNT(*) > 100;

Rule of thumb: if the condition involves an aggregate function, use HAVING. If it filters individual row values, use WHERE.

5. NULL Gotchas¶

NULL is not a value — it represents the absence of a value. This creates unintuitive behavior:

-- These are all FALSE or UNKNOWN, never TRUE:
NULL = NULL
NULL != NULL
NULL > 5
NULL = ''

-- Correct NULL checks:
x IS NULL
x IS NOT NULL

-- COALESCE provides a fallback:
SELECT COALESCE(middle_name, '') AS middle_name FROM users;

-- NULL in aggregations:
-- COUNT(*) counts the row. COUNT(col) skips it.
-- SUM of an all-NULL group returns NULL, not 0.

6. JOIN Basics¶

INNER JOIN  — only rows that match in both tables
LEFT JOIN   — all rows from left table, NULLs where right has no match
RIGHT JOIN  — all rows from right table, NULLs where left has no match
FULL JOIN   — all rows from both, NULLs where no match on either side
CROSS JOIN  — every combination (cartesian product)

-- Find requests with their server name
SELECT r.path, r.status, s.hostname
FROM requests r
INNER JOIN servers s ON r.server_id = s.id
WHERE r.status >= 500;

-- Find servers with no recent requests (LEFT JOIN + NULL check)
SELECT s.hostname
FROM servers s
LEFT JOIN requests r ON s.id = r.server_id
    AND r.created_at > NOW() - INTERVAL '1 hour'
WHERE r.id IS NULL;

7. Practical Patterns for Ops¶

-- Top 10 error paths in the last hour
SELECT path, COUNT(*) AS errors
FROM access_log
WHERE status >= 500
    AND timestamp > NOW() - INTERVAL '1 hour'
GROUP BY path
ORDER BY errors DESC
LIMIT 10;

-- Request rate per minute
SELECT
    DATE_TRUNC('minute', timestamp) AS minute,
    COUNT(*) AS requests
FROM access_log
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY DATE_TRUNC('minute', timestamp)
ORDER BY minute;

-- Percentile response times (PostgreSQL)
SELECT
    path,
    PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY duration_ms) AS p50,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) AS p95,
    PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) AS p99
FROM access_log
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY path;

What Experienced People Know¶

Write queries in logical evaluation order mentally, then translate to SQL syntax order. This prevents most "column not found" errors.
Always inspect raw data (SELECT * ... LIMIT 10) before writing aggregations. You need to understand what a row represents.
COUNT(DISTINCT col) is useful but expensive on large tables. Know when approximate counts (HyperLogLog) are acceptable.
Implicit type coercion in WHERE clauses can silently prevent index usage. If the column is integer, do not compare it to a string.
EXPLAIN ANALYZE (or your database's equivalent) is the single most valuable debugging tool for slow queries. Read the output bottom-up.
In most databases, SELECT * in production queries is wasteful — it fetches columns you do not need and prevents covering index scans.
The difference between NOT IN (subquery) and NOT EXISTS matters when the subquery can return NULLs. Prefer NOT EXISTS.
Temporary tables and CTEs (WITH clauses) make complex queries readable. Do not try to write everything as a single statement.
For time-series queries, always filter on the timestamp column first. This is usually where your indexes live and where performance wins are.
Window functions (ROW_NUMBER, LAG, LEAD) are extremely powerful for ops queries but often overlooked by people who learned SQL casually.

AWS Database Flashcards (CLI) (flashcard_deck, L1) — Database Operations
Database Operations Flashcards (CLI) (flashcard_deck, L1) — Database Operations
Database Operations on Kubernetes (Topic Pack, L2) — Database Operations
Database Ops Drills (Drill, L2) — Database Operations
Databases Flashcards (CLI) (flashcard_deck, L1) — SQL
Interview: Database Failover During Deploy (Scenario, L3) — Database Operations
PostgreSQL Operations (Topic Pack, L2) — Database Operations
Redis Operations (Topic Pack, L2) — Database Operations
SQL Flashcards (CLI) (flashcard_deck, L1) — SQL
SQLite Operations & Internals (Topic Pack, L2) — Database Operations

SQL Fundamentals - Primer¶

Why This Matters¶

Core Concepts¶

1. Logical Evaluation Order¶

2. A Complete Example¶

3. Aggregation Functions¶

4. WHERE vs HAVING¶

5. NULL Gotchas¶

6. JOIN Basics¶

7. Practical Patterns for Ops¶

What Experienced People Know¶

Wiki Navigation¶

Pages that link here¶

SQL Fundamentals - Primer¶

Why This Matters¶

Core Concepts¶

1. Logical Evaluation Order¶

2. A Complete Example¶

3. Aggregation Functions¶

4. WHERE vs HAVING¶

5. NULL Gotchas¶

6. JOIN Basics¶

7. Practical Patterns for Ops¶

What Experienced People Know¶

Wiki Navigation¶

Related Content¶

Pages that link here¶