Link: SQL
Often combine with SQL aggregate functions.
SELECT category, SUM(spend) FROM product_spend
GROUP BY category
Use numbers for column names
SELECT
ticker,
EXTRACT(YEAR FROM date) AS year,
ROUND(AVG(open),2) AS avg_open
FROM stock_prices
GROUP BY 1, 2
HAVING
HAVING condition1
: filter data based on values from SQL aggregate functions
SELECT policy_holder_id, COUNT(*) FROM callers
GROUP BY policy_holder_id
HAVING COUNT(*) >=3;
HAVING vs. WHERE
- Because we can’t use aggregate functions with
WHERE
clause - WHERE filter values on individual rows
- HAVING filter values on aggregated groups
- However, we can use
WHERE
as long as it’s not been aggregatedSELECT candidate_id FROM candidates WHERE skill IN ('python', 'tableau', 'postgresql') GROUP BY candidate_id HAVING count(skill) = 3 ORDER BY candidate_id
Problems with HAVING
I noticed I’m getting error if I refer the aggregated columns as customized names in HAVING
. Thus I have to type the aggregate functions again when using them.
-- Get error "column min_open does not exist"
SELECT ticker, MIN(open) as min_open FROM stock_prices
GROUP BY ticker
HAVING min_open > 100
-- Below runs without error:
SELECT ticker, MIN(open) as min_open FROM stock_prices
GROUP BY ticker
HAVING MIN(open) > 100
Multiple HAVING clauses
Multiple HAVING work with AND, OR
SELECT ticker, AVG(open), MIN(open) FROM stock_prices
GROUP BY ticker
HAVING AVG(open) > 200 AND MIN(open) > 100;
SQL execution order
The sequence matters. The order for below commands are:
- SELECT
- FROM
- WHERE
- GROUP BY
- HAVING
- ORDER BY