SQL Advanced Queries Guide
This guide delves into advanced SQL querying techniques, moving beyond basic SELECT statements to tackle complex data retrieval and manipulation scenarios. Mastering these concepts is crucial for efficient database management and data analysis.
1. Introduction to Advanced SQL Queries
While basic SELECT, WHERE, and ORDER BY clauses are fundamental, real-world data analysis and application development often require more sophisticated SQL. Advanced queries enable:
- Aggregated Reporting: Summarizing data using
GROUP BYand aggregate functions. - Data Integration: Combining data from multiple tables using various
JOINtypes. - Complex Filtering: Using subqueries and
HAVINGclauses for nuanced conditions. - Analytical Operations: Performing calculations across sets of table rows using window functions.
- Readability and Modularity: Breaking down complex queries with Common Table Expressions (CTEs).
2. Aggregation with GROUP BY and HAVING
GROUP BY is used with aggregate functions to group rows that have the same values into summary rows. HAVING is then used to filter these grouped results.
Aggregate Functions
| Function | Description | Example |
|---|---|---|
COUNT() |
Number of rows / non-NULL values | COUNT(*), COUNT(column) |
SUM() |
Total sum of a numeric column | SUM(amount) |
AVG() |
Average value of a numeric column | AVG(price) |
MIN() |
Minimum value in a column | MIN(salary) |
MAX() |
Maximum value in a column | MAX(age) |
GROUP BY Clause
-- Count the number of books in each category
SELECT category, COUNT(book_id) AS total_books
FROM Books
GROUP BY category;
-- Calculate the average salary for each department
SELECT department, AVG(salary) AS average_salary
FROM Employees
GROUP BY department;
HAVING Clause
HAVING is used to filter the results of a GROUP BY clause, similar to how WHERE filters individual rows.
-- Find categories with more than 5 books
SELECT category, COUNT(book_id) AS total_books
FROM Books
GROUP BY category
HAVING COUNT(book_id) > 5;
-- Find departments where the average salary is above $60,000
SELECT department, AVG(salary) AS average_salary
FROM Employees
GROUP BY department
HAVING AVG(salary) > 60000;
3. Joins
Joins are used to combine rows from two or more tables based on a related column between them.
INNER JOIN
Returns only the rows that have matching values in both tables.
-- Get book titles and their authors
SELECT B.title, A.author_name
FROM Books AS B
INNER JOIN Authors AS A
ON B.author_id = A.author_id;
LEFT JOIN (or LEFT OUTER JOIN)
Returns all rows from the left table, and the matching rows from the right table. If there is no match, the right side will have NULL values.
-- Get all authors and their books (even if they have no books)
SELECT A.author_name, B.title
FROM Authors AS A
LEFT JOIN Books AS B
ON A.author_id = B.author_id;
RIGHT JOIN (or RIGHT OUTER JOIN)
Returns all rows from the right table, and the matching rows from the left table. If there is no match, the left side will have NULL values.
-- Get all books and their authors (even if some books have no author assigned,
-- less common use case in practice as author_id is usually NOT NULL)
SELECT A.author_name, B.title
FROM Authors AS A
RIGHT JOIN Books AS B
ON A.author_id = B.author_id;
FULL JOIN (or FULL OUTER JOIN)
Returns rows when there is a match in one of the tables. If no match, it returns NULL for the columns of the table without a match.
-- Get all authors and all books, matching where possible (NULLs otherwise)
SELECT A.author_name, B.title
FROM Authors AS A
FULL JOIN Books AS B
ON A.author_id = B.author_id;
CROSS JOIN
Returns the Cartesian product of the rows of the joined tables (each row from the first table is combined with each row from the second table).
-- Combine every author with every book
SELECT A.author_name, B.title
FROM Authors AS A
CROSS JOIN Books AS B;
4. Subqueries
A subquery (or inner query) is a query nested inside another SQL query. It can be used in the SELECT, FROM, WHERE, or HAVING clauses.
IN / NOT IN Subqueries
-- Find employees who work in the 'Sales' department
SELECT employee_name
FROM Employees
WHERE department_id IN (SELECT department_id FROM Departments WHERE department_name = 'Sales');
-- Find products that have never been ordered
SELECT product_name
FROM Products
WHERE product_id NOT IN (SELECT product_id FROM Order_Items);
Scalar Subqueries (Returns a single value)
-- Find employees whose salary is greater than the average salary
SELECT employee_name, salary
FROM Employees
WHERE salary > (SELECT AVG(salary) FROM Employees);
EXISTS / NOT EXISTS Subqueries
Used to test for the existence of rows in a subquery. More efficient than IN for large subquery results.
-- Find departments that have at least one employee
SELECT department_name
FROM Departments
WHERE EXISTS (SELECT 1 FROM Employees WHERE Employees.department_id = Departments.department_id);
Subqueries in FROM Clause (Derived Tables)
-- Find the average quantity ordered per product
SELECT product_name, avg_qty_ordered
FROM Products
JOIN (
SELECT product_id, AVG(quantity) AS avg_qty_ordered
FROM Order_Items
GROUP BY product_id
) AS AvgOrders
ON Products.product_id = AvgOrders.product_id;
5. Window Functions
Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, window functions do not group rows; instead, they return a value for each row.
Basic Syntax
FUNCTION_NAME(column) OVER ([PARTITION BY column] [ORDER BY column [ASC|DESC]])
Ranking Functions
| Function | Description |
|---|---|
ROW_NUMBER() |
Assigns a unique, sequential integer to each row within its partition. |
RANK() |
Assigns a rank within its partition, with gaps for ties. |
DENSE_RANK() |
Assigns a rank within its partition, without gaps for ties. |
NTILE(n) |
Divides rows into n groups. |
-- Rank employees by salary within each department
SELECT
employee_name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM Employees;
-- Get the top 3 employees by salary in the entire company
SELECT * FROM (
SELECT
employee_name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) as rn
FROM Employees
) AS RankedEmployees
WHERE rn <= 3;
Value Functions
| Function | Description |
|---|---|
LAG(col, offset, default) |
Access data from a previous row. |
LEAD(col, offset, default) |
Access data from a subsequent row. |
-- Calculate the difference in sales from the previous month
SELECT
sale_month,
monthly_sales,
LAG(monthly_sales, 1, 0) OVER (ORDER BY sale_month) AS previous_month_sales,
monthly_sales - LAG(monthly_sales, 1, 0) OVER (ORDER BY sale_month) AS sales_difference
FROM Monthly_Sales;
6. Common Table Expressions (CTEs)
A Common Table Expression (CTE) is a temporary, named result set that you can reference within a single SQL statement (SELECT, INSERT, UPDATE, or DELETE). It improves readability and reusability of complex queries.
Basic Syntax
WITH CTE_Name (column1, column2, ...) AS (
SELECT column1, column2, ...
FROM another_table
WHERE condition
)
SELECT *
FROM CTE_Name
WHERE another_condition;
CTE Example
-- Find the top 3 highest-paid employees in each department using CTE
WITH DepartmentAvgSalary AS (
SELECT
department_id,
AVG(salary) AS avg_dept_salary
FROM Employees
GROUP BY department_id
),
RankedEmployees AS (
SELECT
employee_name,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) as rn
FROM Employees
)
SELECT
re.employee_name,
re.salary,
das.avg_dept_salary
FROM RankedEmployees AS re
JOIN DepartmentAvgSalary AS das
ON re.department_id = das.department_id
WHERE re.rn <= 3;
7. Set Operators
Set operators combine the results of two or more SELECT statements into a single result set.
UNION / UNION ALL
Combines the result sets of two or more SELECT statements. UNION removes duplicates, while UNION ALL retains them.
-- Get all names from two different tables
SELECT first_name FROM Customers
UNION
SELECT employee_name FROM Employees;
-- Get all first names, including duplicates
SELECT first_name FROM Customers
UNION ALL
SELECT employee_name FROM Employees;
INTERSECT
Returns only the rows that are common to both SELECT statements. (Not supported in MySQL directly, often simulated with INNER JOIN).
-- Find customers who are also employees
SELECT email FROM Customers
INTERSECT
SELECT email FROM Employees;
EXCEPT (or MINUS in Oracle)
Returns rows from the first SELECT statement that are not present in the second SELECT statement. (Not supported in MySQL directly, often simulated with LEFT JOIN and WHERE IS NULL).
-- Find employees who are not also customers
SELECT email FROM Employees
EXCEPT
SELECT email FROM Customers;
8. Practical Assignments (from new data)
Apply your knowledge of advanced SQL queries to solve the following problems.
Assignment 1: Student Grades Analysis
Assume the following tables:
Students:(student_id PK, name, major)Courses:(course_id PK, course_name, department)Enrollments:(student_id FK, course_id FK, grade)
Queries:
- List all students who have enrolled in ‘Computer Science’ courses.
- Hint: Use a subquery or JOIN with
Courses.
-- Solution: SELECT DISTINCT S.name FROM Students S JOIN Enrollments E ON S.student_id = E.student_id JOIN Courses C ON E.course_id = C.course_id WHERE C.department = 'Computer Science'; - Hint: Use a subquery or JOIN with
- Find the average grade for each course.
- Hint: Use
GROUP BYandAVG().
-- Solution: SELECT C.course_name, AVG(E.grade) AS average_grade FROM Courses C JOIN Enrollments E ON C.course_id = E.course_id GROUP BY C.course_name; - Hint: Use
- Identify students who have a grade of ‘A’ in at least one course.
- Hint: Use
EXISTSorIN.
-- Solution: SELECT S.name FROM Students S WHERE EXISTS ( SELECT 1 FROM Enrollments E WHERE E.student_id = S.student_id AND E.grade = 'A' ); - Hint: Use
- List students who have taken all courses offered by the ‘Math’ department.
- Hint: Count distinct math courses and compare.
-- Solution: WITH MathCoursesCount AS ( SELECT COUNT(course_id) AS total_math_courses FROM Courses WHERE department = 'Math' ) SELECT S.name FROM Students S JOIN Enrollments E ON S.student_id = E.student_id JOIN Courses C ON E.course_id = C.course_id WHERE C.department = 'Math' GROUP BY S.name HAVING COUNT(DISTINCT C.course_id) = (SELECT total_math_courses FROM MathCoursesCount);
Assignment 2: Employee Department Analysis
Assume the following tables:
Employees:(employee_id PK, name, salary, department_id FK)Departments:(department_id PK, department_name, location)
Queries:
- Find the names of all employees who earn more than the average salary of their respective departments.
- Hint: Use a correlated subquery or a window function.
-- Solution using Correlated Subquery: SELECT E1.name, E1.salary, D.department_name FROM Employees E1 JOIN Departments D ON E1.department_id = D.department_id WHERE E1.salary > ( SELECT AVG(E2.salary) FROM Employees E2 WHERE E2.department_id = E1.department_id ); -- Solution using Window Function: SELECT name, salary, department_name FROM ( SELECT E.name, E.salary, D.department_name, AVG(E.salary) OVER (PARTITION BY D.department_name) AS avg_dept_salary FROM Employees E JOIN Departments D ON E.department_id = D.department_id ) AS Subquery WHERE salary > avg_dept_salary; - List the top 2 highest-paid employees in each department.
- Hint: Use a ranking window function.
-- Solution: WITH RankedEmployees AS ( SELECT E.name, D.department_name, E.salary, ROW_NUMBER() OVER (PARTITION BY D.department_name ORDER BY E.salary DESC) as rn FROM Employees E JOIN Departments D ON E.department_id = D.department_id ) SELECT name, department_name, salary FROM RankedEmployees WHERE rn <= 2; - Calculate the running total of salaries for each department, ordered by employee ID.
- Hint: Use a window function with
SUM()andORDER BY.
-- Solution: SELECT E.name, D.department_name, E.salary, SUM(E.salary) OVER (PARTITION BY D.department_name ORDER BY E.employee_id) AS running_total_salary FROM Employees E JOIN Departments D ON E.department_id = D.department_id; - Hint: Use a window function with
9. Practice Scenario: Social Network Analysis
Schema:
Users:(id, username, created_at)Follows:(follower_id, followee_id, created_at)Posts:(id, user_id, content, created_at)Comments:(id, post_id, user_id, content)
Task 1: Follower Count Leaderboard Find the top 5 users with the most followers, including their username and account age.
SELECT
U.username,
U.created_at,
COUNT(F.follower_id) as follower_count
FROM Users U
LEFT JOIN Follows F ON U.id = F.followee_id
GROUP BY U.id
ORDER BY follower_count DESC
LIMIT 5;
Task 2: High Engagement Posts Find posts that have more than 10 comments, showing the post content, author name, and comment count.
SELECT
P.content,
U.username AS author,
COUNT(C.id) AS comment_count
FROM Posts P
JOIN Users U ON P.user_id = U.id
JOIN Comments C ON P.id = C.post_id
GROUP BY P.id
HAVING COUNT(C.id) > 10
ORDER BY comment_count DESC;
Task 3: Mutual Follows (Self-Join) Find pairs of users who follow each other.
SELECT
U1.username AS User_A,
U2.username AS User_B
FROM Follows F1
JOIN Follows F2
ON F1.follower_id = F2.followee_id
AND F1.followee_id = F2.follower_id
JOIN Users U1 ON F1.follower_id = U1.id
JOIN Users U2 ON F1.followee_id = U2.id
WHERE F1.follower_id < F1.followee_id; -- Avoid duplicates (A-B and B-A)
See Also
- SQL and SQLAlchemy Cheat Sheet - Basic SQL syntax and ORM concepts.
- SQL DDL Guide - For understanding how database structures are created and modified
- Big O Notation Cheat Sheet - Understanding the performance implications of complex queries.