๐ง SQL Query to Find Duplicate Records with Different Values (Real-World Example)
Data duplication is one of the most common challenges in database management. Often, youโll encounter duplicate records with mismatched details โ such as the same employee ID appearing with different email addresses or contact numbers. These inconsistencies can cause serious reporting errors in analytics tools like Power BI or Excel dashboards.
In this post, youโll learn how to identify and clean duplicate records using simple yet powerful SQL queries that work across MySQL, SQL Server, and PostgreSQL. Youโll also see how to safely remove duplicates while preserving the most accurate and latest data.
By the end, youโll have a ready-to-use SQL pattern you can apply to your HR, sales, or finance datasets โ ensuring your reports stay consistent and reliable.
๐งฉ Sample Data: Employees Table
Below is an example of an employees table that contains duplicate records with mismatched details.
| employee_id | employee_name | department | |
|---|---|---|---|
| 101 | John Smith | [email protected] | HR |
| 101 | John Smith | [email protected] | HR |
| 203 | Alice Johnson | [email protected] | Finance |
| 203 | Alice Johnson | [email protected] | Finance |
| 301 | Mark Adams | [email protected] | IT |
As you can see, employee IDs 101 and 203 appear multiple times with different email addresses โ a clear indicator of duplicate records with inconsistent data.
๐ How It Works
GROUP BY employee_idโ Groups all records belonging to each employee.COUNT(DISTINCT email)โ Counts unique email IDs for that employee.HAVING COUNT(DISTINCT email) > 1โ Filters employees with more than one unique email, i.e., duplicates with inconsistent data.
๐ Output
| employee_id | unique_emails |
|---|---|
| 101 | 2 |
| 203 | 2 |
โ๏ธ Bonus: Remove Duplicates Safely (Keep Latest Record)
You can remove duplicates while keeping the most recent record by using CTE and ROW_NUMBER() (available in SQL Server, PostgreSQL, MySQL 8.0+).
WITH ranked_data AS (
SELECT employee_id, email,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY email DESC) AS rn
FROM employees
)
DELETE FROM ranked_data WHERE rn > 1;
๐งฉ This ensures that only one (latest) record per employee ID is retained.
๐ง Real-Life Use Case
- HR teams use this to detect duplicate employee records across systems.
- Data analysts use it to improve Power BI reports by cleaning inconsistent tables.
- Database admins use it before migration or integration tasks.
This type of query is extremely useful for ETL processes, BI reporting, and analytics pipelines.
๐ Recommended Books to Master SQL & Analytics
(Amazon Affiliate Links โ handpicked for you)
- ๐ Learning SQL: Generate, Manipulate, and Retrieve Data, Third Edition
- ๐ค AI Engineering: Building Applications with Foundation Models
- ๐ Storytelling With Data: A Data Visualization Guide for Business Professionals
- ๐ก Microsoft Power BI for Dummies
- ๐งฉ The Definitive Guide to DAX (2nd Edition)
๐ก These books are top-rated for SQL, AI integration, and Power BI learning โ perfect if you want to grow from analyst to data professional.
๐ Related SQL Tutorials
- SQL Query to Find Employees Who Didnโt Attend Any Training โ
- SQL Query to Calculate Running Total Without Window Functions โ
- Find Top 3 Salaries Without LIMIT or TOP Keyword โ
๐ฌ Discuss with the Community
Have a similar query or a unique data problem?
๐ Join the discussion on our SQL Community Forum
Share your query structure, and weโll help you build the best solution!