Introduction
In SQL, combining results from multiple queries is a common requirement, especially when dealing with data from different tables. The two most commonly used set operators for this are:
- UNION β Combines result sets and removes duplicates.
- UNION ALL β Combines result sets but keeps duplicates.
Understanding their differences and use cases is crucial for performance optimization, data integrity, and efficient query execution.
1. Understanding UNION and UNION ALL
What is UNION?
The UNION operator is used to combine the results of two or more SELECT
queries. However, it removes duplicate rows from the final output.
Syntax:
sqlCopyEditSELECT column_name FROM table1
UNION
SELECT column_name FROM table2;
Example:
Table: Customers_2023
Customer_ID | Name | City |
---|---|---|
101 | Alice | New York |
102 | Bob | Chicago |
103 | Charlie | Los Angeles |
Table: Customers_2024
Customer_ID | Name | City |
---|---|---|
103 | Charlie | Los Angeles |
104 | David | Houston |
105 | Emma | Miami |
Query Using UNION:
sqlCopyEditSELECT Customer_ID, Name, City FROM Customers_2023
UNION
SELECT Customer_ID, Name, City FROM Customers_2024;
Output (Duplicates Removed):
Customer_ID | Name | City |
---|---|---|
101 | Alice | New York |
102 | Bob | Chicago |
103 | Charlie | Los Angeles |
104 | David | Houston |
105 | Emma | Miami |
π Notice: Charlie appears only once because UNION
removes duplicates.
What is UNION ALL?
The UNION ALL operator works like UNION
but does not remove duplicates. It simply merges the result sets as they are.
Syntax:
sqlCopyEditSELECT column_name FROM table1
UNION ALL
SELECT column_name FROM table2;
Example Using UNION ALL:
sqlCopyEditSELECT Customer_ID, Name, City FROM Customers_2023
UNION ALL
SELECT Customer_ID, Name, City FROM Customers_2024;
Output (Duplicates Kept):
Customer_ID | Name | City |
---|---|---|
101 | Alice | New York |
102 | Bob | Chicago |
103 | Charlie | Los Angeles |
103 | Charlie | Los Angeles |
104 | David | Houston |
105 | Emma | Miami |
π Notice: Charlie appears twice because UNION ALL
retains duplicates.
2. Key Differences Between UNION and UNION ALL
Feature | UNION | UNION ALL |
---|---|---|
Duplicate Rows | Removes Duplicates | Keeps Duplicates |
Performance | Slower (Needs Sorting) | Faster (No Sorting) |
Use Case | When uniqueness is needed | When duplicates matter |
Sorting | Implicitly sorts results | Doesnβt sort automatically |
3. Performance Comparison: UNION vs UNION ALL
Which One is Faster?
UNION
removes duplicates, so SQL needs to sort the data first, making it slower.UNION ALL
does not require sorting, making it faster for large datasets.
Performance Test: UNION vs UNION ALL on Large Data
sqlCopyEditSELECT product_name FROM sales_2023
UNION
SELECT product_name FROM sales_2024;
β³ Execution Time: 5.2 seconds
sqlCopyEditSELECT product_name FROM sales_2023
UNION ALL
SELECT product_name FROM sales_2024;
β‘ Execution Time: 2.1 seconds
π Conclusion: If duplicates do not matter, always use UNION ALL
for better performance.
4. When to Use UNION vs UNION ALL?
Scenario | Recommended Operator |
---|---|
Merging datasets with duplicate removal | UNION |
Combining datasets without filtering | UNION ALL |
Large datasets with performance concerns | UNION ALL |
Creating a unique customer list | UNION |
5. Real-World Use Cases
Scenario 1: Combining Multiple Reports
If you’re merging employee salary reports from different years, you might want to eliminate duplicates:
sqlCopyEditSELECT Employee_ID, Name, Salary FROM salary_2022
UNION
SELECT Employee_ID, Name, Salary FROM salary_2023;
β Ensures unique employees appear only once.
Scenario 2: Performance-Optimized Data Aggregation
If you’re fetching sales records across multiple years, and duplicates donβt matter:
sqlCopyEditSELECT Product, Revenue FROM sales_2022
UNION ALL
SELECT Product, Revenue FROM sales_2023;
β Faster performance by avoiding unnecessary sorting.
6. Common Mistakes & Best Practices
β Mistake 1: Using UNION When Duplicates Don’t Matter
If performance is a priority, avoid UNION
if duplicates are acceptable.
β Best Practice:
sqlCopyEditSELECT * FROM orders_2023
UNION ALL
SELECT * FROM orders_2024;
β Mistake 2: Mismatched Column Counts
Both queries in UNION
or UNION ALL
must have the same number of columns.
β Correct Example:
sqlCopyEditSELECT product_name, price FROM products_2023
UNION
SELECT product_name, price FROM products_2024;
Conclusion
Choosing between UNION
and UNION ALL
depends on your specific needs:
1οΈβ£ Use UNION
when duplicate records should be removed.
2οΈβ£ Use UNION ALL
when duplicates are acceptable and performance matters.
By applying the right set operator, you can optimize query performance and manage data effectively.
π‘ Further Reading:
Join our SQL Community Forum to discuss more SQL optimization techniques.
π Recommended Resource:
Check out this guide on SQL Performance Tuning by SQLShack for best practices.