SQL Query to Find Duplicate Records with Different Values

🧠 SQL Query to Find Duplicate Records with Different Values (Real-World Example)

Data duplication is one of the most common challenges in database management. Often, you’ll encounter duplicate records with mismatched details — such as the same employee ID appearing with different email addresses or contact numbers. These inconsistencies can cause serious reporting errors in analytics tools like Power BI or Excel dashboards.

In this post, you’ll learn how to identify and clean duplicate records using simple yet powerful SQL queries that work across MySQL, SQL Server, and PostgreSQL. You’ll also see how to safely remove duplicates while preserving the most accurate and latest data.

By the end, you’ll have a ready-to-use SQL pattern you can apply to your HR, sales, or finance datasets — ensuring your reports stay consistent and reliable.

🧩 Sample Data: Employees Table

Below is an example of an employees table that contains duplicate records with mismatched details.

employee_id	employee_name	email	department
101	John Smith	[email protected]	HR
101	John Smith	[email protected]	HR
203	Alice Johnson	[email protected]	Finance
203	Alice Johnson	[email protected]	Finance
301	Mark Adams	[email protected]	IT

As you can see, employee IDs 101 and 203 appear multiple times with different email addresses — a clear indicator of duplicate records with inconsistent data.

🔍 How It Works

GROUP BY employee_id – Groups all records belonging to each employee.
COUNT(DISTINCT email) – Counts unique email IDs for that employee.
HAVING COUNT(DISTINCT email) > 1 – Filters employees with more than one unique email, i.e., duplicates with inconsistent data.

📊 Output

employee_id	unique_emails
101	2
203	2

⚙️ Bonus: Remove Duplicates Safely (Keep Latest Record)

You can remove duplicates while keeping the most recent record by using CTE and ROW_NUMBER() (available in SQL Server, PostgreSQL, MySQL 8.0+).

WITH ranked_data AS (
  SELECT employee_id, email,
         ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY email DESC) AS rn
  FROM employees
)
DELETE FROM ranked_data WHERE rn > 1;

🧩 This ensures that only one (latest) record per employee ID is retained.

🧠 Real-Life Use Case

HR teams use this to detect duplicate employee records across systems.
Data analysts use it to improve Power BI reports by cleaning inconsistent tables.
Database admins use it before migration or integration tasks.

This type of query is extremely useful for ETL processes, BI reporting, and analytics pipelines.

📘 Recommended Books to Master SQL & Analytics

(Amazon Affiliate Links — handpicked for you)

💡 These books are top-rated for SQL, AI integration, and Power BI learning — perfect if you want to grow from analyst to data professional.

🔗 Related SQL Tutorials

💬 Discuss with the Community

Have a similar query or a unique data problem?
👉 Join the discussion on our SQL Community Forum
Share your query structure, and we’ll help you build the best solution!