EY Power BI Interview: Reduce Memory Footprint for 80M+ Rows

EY: How Do You Reduce Memory Footprint in Power BI When Importing Fact Tables Above 80 Million Rows?

This is one of the most frequently asked questions in EY (Ernst & Young) Power BI, BI Architect, Data Engineering, and Analytics interviews. EY handles large-scale enterprise analytics implementations across banking, taxation, audit, insurance, global finance, supply chain, and public sector domains. These implementations often deal with massive datasets — sometimes exceeding 50M, 80M, 200M, or even 500M+ rows.

Memory consumption becomes the single biggest performance bottleneck when building Import-mode models in Power BI because:

  • Power BI compresses data in-memory using the VertiPaq engine
  • High-cardinality columns increase memory footprint
  • Large fact tables with multiple numeric columns can consume gigabytes of RAM
  • Memory constraints cause refresh failures and slow performance

In this article, we will walk through the complete EY-style approach for reducing memory footprint when working with Power BI Import-mode fact tables above 80M+ rows.

Recommended Reading: Check our Capgemini RLS guide on implementing multi-layer RLS in Power BI before working with large data models.

For official VertiPaq documentation, visit Microsoft Power BI Data Modeling Guidance .

1. Why EY Prioritizes Memory Optimization

EY’s enterprise dashboards are used by:

  • Thousands of auditors
  • Financial analysts
  • Risk officers
  • Taxation experts
  • Business leaders

These users rely on high-performance dashboards to make critical decisions. A memory-heavy fact table creates:

  • Slow refresh times (30–90 minutes)
  • Refresh failures due to memory exhaustion
  • Longer visual load times
  • Dashboard timeouts
  • Poor user experience

Hence, EY interviewers want to know if you understand how to tune Import-mode models to behave efficiently even at 80M+ rows.

2. Understanding the VertiPaq Engine

The first step in optimizing memory is understanding how VertiPaq compresses data:

  • Stores data in a columnar format
  • Creates dictionaries for text columns
  • Encodes numeric values efficiently
  • Segments each column into partitions
  • Compresses repeating values

Based on EY benchmarks:

  • A text column can consume 10–50× more memory than a number column
  • A high-cardinality GUID can kill compression completely
  • A single unnecessary datetime field can double memory footprint

Hence, the optimization process starts with column-level analysis.

3. Remove All Unnecessary Columns from the Fact Table

EY emphasizes a strict rule:

“If a column is not needed for reporting or relationships, REMOVE it before import.”

Based on real EY audit and risk dashboards, 20–40% of fact table columns are unnecessary.

Remove:

  • Audit columns (CreatedOn, UpdatedOn, InsertUser, etc.)
  • Free-text fields
  • Long alphanumeric keys
  • Description fields
  • Status description text fields
  • Unused numeric attributes
  • Metadata fields

Example SQL Clean-up:


SELECT
    TransactionID,
    DateID,
    ProductID,
    RegionID,
    Amount,
    Quantity
FROM FactSalesRaw;
  

Removing 10–20 unnecessary columns reduces memory usage dramatically.

4. Reduce Cardinality Wherever Possible

Cardinality = number of unique values in a column.

High-cardinality columns explode memory usage because VertiPaq cannot compress them.

4.1 Convert Text to Integers


SELECT
    EmployeeID,
    RegionID,
    DepartmentID
FROM FactEmployeeSales;
  

Never import:

  • Customer Names
  • Email IDs
  • Full addresses
  • Long product names

4.2 Split DateTime into Date + Time Fields

Datetime = very high cardinality.

4.3 Round Numerical Values


ROUND(SalesAmount, 2)
  

Unique decimal values reduce compression.

5. Use Star Schema — NOT Snowflake Schema

EY strongly enforces star-schema design.

Snowflake schema (normalized) creates:

  • Multiple relationship joins
  • More memory footprint
  • Slower DAX performance

Instead, flatten dimensions wherever possible.

6. Use Aggregation Tables (EY Best Practice)

Instead of importing the entire 80M+ fact table:

  • Import a daily aggregated table
  • Use DirectQuery for detailed views
  • Combine both with composite models

Example Aggregated Table:


CREATE TABLE FactSalesAgg AS
SELECT
    DateID,
    ProductID,
    RegionID,
    SUM(Amount) AS TotalSales,
    SUM(Quantity) AS TotalQty
FROM FactSales
GROUP BY DateID, ProductID, RegionID;
  

7. Optimize Data Types

  • Use TinyInt, SmallInt instead of Int
  • Use Fixed Decimal instead of Float
  • Use Date instead of DateTime if possible
  • Use Boolean instead of Text flags

8. Avoid Calculated Columns in Power BI

Move ALL column logic into:

  • SQL views
  • ETL layer
  • Power Query M

Calculated columns consume RAM for every refresh.

9. Use Incremental Refresh

Import only the last 1–7 days of new data.

Store last 3–10 years of historical data in compressed partitions.

10. Use Selective Column Loading in Power Query

Only import columns referenced:

  • In visuals
  • In DAX
  • In relationships

11. Disable Auto Date/Time


Options → Time Intelligence → Disable Auto DateTime
  

Auto-generated date tables increase model size drastically.

12. Final Interview-Ready Answer (EY Style)

“To reduce memory footprint for 80M+ row import tables, I start by removing unnecessary columns, reducing cardinality, optimizing data types, and converting text to numeric keys. I design a proper star schema, move calculated column logic to SQL/ETL, use aggregation tables, implement incremental refresh, and disable Auto DateTime. These steps reduce memory usage by 40–80% and significantly speed up refresh performance — critical for EY-scale enterprise models.”

13. Conclusion

Optimizing Power BI’s memory footprint is essential for handling large datasets in EY enterprise analytics. By applying the techniques above — dimension simplification, cardinality reduction, star schema, aggregations, incremental refresh, and VertiPaq tuning — you can deliver fast, scalable models suitable for global audit and finance workloads.

Leave a comment