Forum

Share:
Notifications
Clear all

ETL & Data Warehousing Interview Questions for TCS, Infosys, and Cognizant (2025 Edition)

1 Posts
1 Users
0 Reactions
197 Views
Posts: 134
Admin
Topic starter
(@sql-admin)
Reputable Member
Joined: 6 years ago

If you’re preparing for ETL or Data Warehousing interviews at TCS, Infosys, or Cognizant, this guide is your go-to resource.
ETL developers and data analysts are expected to understand data extraction, transformation logic, schema design, and performance optimization in real-time enterprise environments.

These 2025 interview questions are collected from real candidate experiences and cover both conceptual and scenario-based topics.


🔹 Top ETL & Data Warehousing Interview Questions (2025 Edition)

1️⃣ What is ETL and why is it important?

Answer:
ETL stands for Extract, Transform, Load — the process of moving and preparing data for analysis.

  • Extract: Pull data from multiple sources (databases, APIs, files).

  • Transform: Clean, validate, and structure it.

  • Load: Store in a data warehouse or reporting system.
    Common question in TCS and Cognizant technical rounds.


2️⃣ What are the key differences between OLTP and OLAP systems?

FeatureOLTPOLAP
PurposeTransactionalAnalytical
Data VolumeLowHigh
QueriesSimple, frequentComplex, aggregate
ExampleBanking systemBusiness intelligence reports

3️⃣ Explain the Star Schema and Snowflake Schema.

  • Star Schema: Fact table linked to dimension tables — simple and fast for reporting.

  • Snowflake Schema: Dimensions are normalized into sub-dimensions — reduces redundancy but increases joins.

Infosys often asks which schema you used in your last project and why.


4️⃣ What are fact tables and dimension tables?

  • Fact Table: Stores measurable data (e.g., sales, revenue).

  • Dimension Table: Stores descriptive attributes (e.g., product, region, customer).
    Example:

  • Fact_Sales: sales_amount, date_key, product_key

  • Dim_Product: product_key, category, brand


5️⃣ What are slowly changing dimensions (SCD)?

Used to manage changes in dimension data over time.
Types:

  • Type 1: Overwrite old data.

  • Type 2: Keep history with new record + effective dates.

  • Type 3: Keep limited history using extra columns.

Example in SQL:

 
UPDATE dim_customer
SET status='Active', end_date=GETDATE()
WHERE customer_id=1001 AND end_date IS NULL;

6️⃣ What tools are commonly used for ETL processes?

  • Informatica PowerCenter

  • Talend

  • SSIS (SQL Server Integration Services)

  • Apache NiFi

  • Azure Data Factory / AWS Glue

Cognizant prefers scenario-based questions on Azure Data Factory pipelines.


7️⃣ How do you handle data quality issues in ETL?

  • Implement data validation rules in transformation stage.

  • Use lookup checks for referential integrity.

  • Apply deduplication logic using keys.

  • Maintain error tables to log invalid records.


8️⃣ What is data staging in ETL?

The staging area is a temporary storage layer used to hold raw data before transformation.
It helps:

  • Reduce load on source systems

  • Manage incremental loads

  • Audit data lineage


9️⃣ How do you perform incremental data loads?

By loading only new or changed records since the last update.
Common methods:

  • Using timestamps or last modified dates

  • Maintaining control tables to store last run times

  • Applying CDC (Change Data Capture)

Example:

 
SELECT * FROM orders
WHERE last_updated > '2025-01-01';

🔟 Explain ETL performance optimization techniques.

  • Push transformations to source database.

  • Use bulk load instead of row-by-row inserts.

  • Index join columns and partition large tables.

  • Parallelize ETL jobs using threads or data partitions.

Infosys interviewers love real-time performance tuning examples.


1️⃣1️⃣ What is a data mart and how is it different from a data warehouse?

  • Data Warehouse: Enterprise-level, multiple subject areas.

  • Data Mart: Department-level, focused on a single business process.
    Example:

A “Finance Data Mart” might be created from the larger enterprise warehouse.


1️⃣2️⃣ What are surrogate keys in data warehouses?

Surrogate keys are system-generated IDs (e.g., identity columns) used instead of business keys.
They simplify joins and improve query performance.


💡 Pro Tip:

When answering ETL or warehousing questions, include a mini project example like:

“In my Cognizant project, we used Informatica to extract data from Oracle and Salesforce, applied business rules in staging, and loaded it into a Snowflake warehouse for Power BI dashboards.”

It adds credibility and shows real-world exposure.


🧩 Conclusion

These ETL and Data Warehousing interview questions (2025 edition) are frequently asked at TCS, Infosys, and Cognizant.
Focus on explaining data flow, transformation logic, and performance strategies — recruiters value clarity and practical experience over memorized definitions.

Consistent hands-on practice with ETL tools and SQL will set you apart in 2025 interviews.


Leave a reply

Author Name

Author Email

Title *

 
Preview 0 Revisions Saved
Share: