SQL Query to Find Duplicate Records with Different Values

๐Ÿง  SQL Query to Find Duplicate Records with Different Values (Real-World Example)

Data duplication is one of the most common challenges in database management. Often, youโ€™ll encounter duplicate records with mismatched details โ€” such as the same employee ID appearing with different email addresses or contact numbers. These inconsistencies can cause serious reporting errors in analytics tools like Power BI or Excel dashboards.

In this post, youโ€™ll learn how to identify and clean duplicate records using simple yet powerful SQL queries that work across MySQL, SQL Server, and PostgreSQL. Youโ€™ll also see how to safely remove duplicates while preserving the most accurate and latest data.

By the end, youโ€™ll have a ready-to-use SQL pattern you can apply to your HR, sales, or finance datasets โ€” ensuring your reports stay consistent and reliable.

๐Ÿงฉ Sample Data: Employees Table

Below is an example of an employees table that contains duplicate records with mismatched details.

employee_idemployee_nameemaildepartment
101John Smith[email protected]HR
101John Smith[email protected]HR
203Alice Johnson[email protected]Finance
203Alice Johnson[email protected]Finance
301Mark Adams[email protected]IT

As you can see, employee IDs 101 and 203 appear multiple times with different email addresses โ€” a clear indicator of duplicate records with inconsistent data.

๐Ÿ” How It Works

  1. GROUP BY employee_id โ€“ Groups all records belonging to each employee.
  2. COUNT(DISTINCT email) โ€“ Counts unique email IDs for that employee.
  3. HAVING COUNT(DISTINCT email) > 1 โ€“ Filters employees with more than one unique email, i.e., duplicates with inconsistent data.

๐Ÿ“Š Output

employee_idunique_emails
1012
2032


โš™๏ธ Bonus: Remove Duplicates Safely (Keep Latest Record)

You can remove duplicates while keeping the most recent record by using CTE and ROW_NUMBER() (available in SQL Server, PostgreSQL, MySQL 8.0+).

WITH ranked_data AS (
  SELECT employee_id, email,
         ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY email DESC) AS rn
  FROM employees
)
DELETE FROM ranked_data WHERE rn > 1;

๐Ÿงฉ This ensures that only one (latest) record per employee ID is retained.


๐Ÿง  Real-Life Use Case

  • HR teams use this to detect duplicate employee records across systems.
  • Data analysts use it to improve Power BI reports by cleaning inconsistent tables.
  • Database admins use it before migration or integration tasks.

This type of query is extremely useful for ETL processes, BI reporting, and analytics pipelines.


๐Ÿ“˜ Recommended Books to Master SQL & Analytics

(Amazon Affiliate Links โ€” handpicked for you)

๐Ÿ’ก These books are top-rated for SQL, AI integration, and Power BI learning โ€” perfect if you want to grow from analyst to data professional.



๐Ÿ”— Related SQL Tutorials


๐Ÿ’ฌ Discuss with the Community

Have a similar query or a unique data problem?
๐Ÿ‘‰ Join the discussion on our SQL Community Forum
Share your query structure, and weโ€™ll help you build the best solution!