Link: SQL Relational database

Simple SQL JOIN

  • Syntax format: below clause has to be in this order and cannot be interrupted
SELECT *
FROM artists
JOIN songs
  ON artists.artist_id = songs.artist_id;
  • Better always indicating which table it comes from, like this:
    SELECT
      users.city,
      COUNT(trades.order_id) AS total_orders
    FROM trades
    INNER JOIN users
      ON trades.user_id = users.user_id
    WHERE trades.status = 'Completed'
    GROUP BY users.city;
  • Similarily, always indicate the type of JOINs

Types of SQL JOIN

  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN (rarely used)
  • FULL OUTER JOIN

Conditional SQL JOIN

  • Use AND, OR or other complex logical expressions for conditional joins
  • The condition is part of the join
SELECT
  g.book_title,
  o.quantity
FROM goodreads AS g
INNER JOIN orders AS o
  ON g.book_id = o.book_id
    AND o.quantity > 2;    -- Use AND to limit orders with quantity > 2

Example: find page ID without likes

My method: use left join then filter

SELECT
  p.page_id
FROM pages AS p
LEFT JOIN page_likes l
  ON p.page_id = l.page_id
WHERE liked_date IS NULL
ORDER BY p.page_id ASC;

Method #2: use EXCEPT

  • EXCEPT subtract page_id from the 2nd table from the 1st
-- Subtract the second table from the first table
SELECT page_id
FROM pages
EXCEPT
SELECT page_id
FROM page_likes;

Method #3: NOT..IN

SELECT page_id
FROM pages
WHERE page_id NOT IN (
  SELECT page_id
  FROM page_likes
  WHERE page_id IS NOT NULL
);

Method #4: NOT EXISTS clause

  • NOT EXISTS clause to check for the non-existence of matching records
SELECT page_id
FROM pages
WHERE NOT EXISTS (
  SELECT page_id
  FROM page_likes AS likes
  WHERE likes.page_id = pages.page_id
;)

JOIN three tables

  • JOINs can be added after another:
SELECT
  g.book_title,
  g.book_rating,
  o.order_date,
  d.delivery_status
FROM goodreads g
JOIN orders o     -- Join the orders table
  ON g.book_id = o.book_id
    AND g.book_rating > 4.0
INNER JOIN deliveries d  -- Join the deliveries table
  ON o.order_id = d.order_id
    AND d.delivery_status = 'Delivered';

SQL JOIN with CASE WHEN

Problem

https://datalemur.com/questions/updated-status

Solution

SELECT
  COALESCE(advertiser.user_id, p.user_id) AS user_id,     --- Use COALESCE to retrieve the id when one table is NULL
  CASE
    WHEN p.paid IS NULL THEN 'CHURN'
    WHEN p.paid IS NOT NULL AND advertiser.status = 'CHURN' THEN 'RESURRECT'
    WHEN p.paid IS NOT NULL AND advertiser.status IS NULL THEN 'NEW'
    ELSE 'EXISTING'
  END AS new_status
FROM advertiser
FULL OUTER JOIN daily_pay p
  ON advertiser.user_id = p.user_id
ORDER BY user_id

SQL JOIN vs. dplyr in R