Link: SQL

Often combine with SQL aggregate functions.

SELECT category, SUM(spend) FROM product_spend
GROUP BY category

Use numbers for column names

SELECT
    ticker,
    EXTRACT(YEAR FROM date) AS year,
    ROUND(AVG(open),2) AS avg_open
FROM stock_prices
GROUP BY 1, 2

HAVING

HAVING condition1: filter data based on values from SQL aggregate functions

SELECT policy_holder_id, COUNT(*) FROM callers
GROUP BY policy_holder_id
HAVING COUNT(*) >=3;

HAVING vs. WHERE

  • Because we can’t use aggregate functions with WHERE clause
  • WHERE filter values on individual rows
  • HAVING filter values on aggregated groups
  • However, we can use WHERE as long as it’s not been aggregated
    SELECT candidate_id FROM candidates
    WHERE skill IN ('python', 'tableau', 'postgresql')
    GROUP BY candidate_id
    HAVING count(skill) = 3
    ORDER BY candidate_id

Problems with HAVING

I noticed I’m getting error if I refer the aggregated columns as customized names in HAVING. Thus I have to type the aggregate functions again when using them.

-- Get error "column min_open does not exist"
SELECT ticker, MIN(open) as min_open FROM stock_prices
GROUP BY ticker
HAVING min_open > 100
 
-- Below runs without error:
SELECT ticker, MIN(open) as min_open FROM stock_prices
GROUP BY ticker
HAVING MIN(open) > 100

Multiple HAVING clauses

Multiple HAVING work with AND, OR

SELECT ticker, AVG(open), MIN(open) FROM stock_prices
GROUP BY ticker
HAVING AVG(open) > 200 AND MIN(open) > 100;

SQL execution order

The sequence matters. The order for below commands are:

  1. SELECT
  2. FROM
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY