EY: How Do You Reduce Memory Footprint in Power BI When Importing Fact Tables Above 80 Million Rows?

This is one of the most frequently asked questions in EY (Ernst & Young) Power BI, BI Architect, Data Engineering, and Analytics interviews. EY handles large-scale enterprise analytics implementations across banking, taxation, audit, insurance, global finance, supply chain, and public sector domains. These implementations often deal with massive datasets — sometimes exceeding 50M, 80M, 200M, or even 500M+ rows.

Memory consumption becomes the single biggest performance bottleneck when building Import-mode models in Power BI because:

Power BI compresses data in-memory using the VertiPaq engine
High-cardinality columns increase memory footprint
Large fact tables with multiple numeric columns can consume gigabytes of RAM
Memory constraints cause refresh failures and slow performance

In this article, we will walk through the complete EY-style approach for reducing memory footprint when working with Power BI Import-mode fact tables above 80M+ rows.

Recommended Reading: Check our Capgemini RLS guide on implementing multi-layer RLS in Power BI before working with large data models.

For official VertiPaq documentation, visit Microsoft Power BI Data Modeling Guidance .

1. Why EY Prioritizes Memory Optimization

EY’s enterprise dashboards are used by:

Thousands of auditors
Financial analysts
Risk officers
Taxation experts
Business leaders

These users rely on high-performance dashboards to make critical decisions. A memory-heavy fact table creates:

Slow refresh times (30–90 minutes)
Refresh failures due to memory exhaustion
Longer visual load times
Dashboard timeouts
Poor user experience

Hence, EY interviewers want to know if you understand how to tune Import-mode models to behave efficiently even at 80M+ rows.

2. Understanding the VertiPaq Engine

The first step in optimizing memory is understanding how VertiPaq compresses data:

Stores data in a columnar format
Creates dictionaries for text columns
Encodes numeric values efficiently
Segments each column into partitions
Compresses repeating values

Based on EY benchmarks:

A text column can consume 10–50× more memory than a number column
A high-cardinality GUID can kill compression completely
A single unnecessary datetime field can double memory footprint

Hence, the optimization process starts with column-level analysis.

3. Remove All Unnecessary Columns from the Fact Table

EY emphasizes a strict rule:

“If a column is not needed for reporting or relationships, REMOVE it before import.”

Based on real EY audit and risk dashboards, 20–40% of fact table columns are unnecessary.

Remove:

Audit columns (CreatedOn, UpdatedOn, InsertUser, etc.)
Free-text fields
Long alphanumeric keys
Description fields
Status description text fields
Unused numeric attributes
Metadata fields

Example SQL Clean-up:


SELECT
    TransactionID,
    DateID,
    ProductID,
    RegionID,
    Amount,
    Quantity
FROM FactSalesRaw;

Removing 10–20 unnecessary columns reduces memory usage dramatically.

4. Reduce Cardinality Wherever Possible

Cardinality = number of unique values in a column.

High-cardinality columns explode memory usage because VertiPaq cannot compress them.

4.1 Convert Text to Integers


SELECT
    EmployeeID,
    RegionID,
    DepartmentID
FROM FactEmployeeSales;

Never import:

Customer Names
Email IDs
Full addresses
Long product names

4.2 Split DateTime into Date + Time Fields

Datetime = very high cardinality.

4.3 Round Numerical Values


ROUND(SalesAmount, 2)

Unique decimal values reduce compression.

5. Use Star Schema — NOT Snowflake Schema

EY strongly enforces star-schema design.

Snowflake schema (normalized) creates:

Multiple relationship joins
More memory footprint
Slower DAX performance

Instead, flatten dimensions wherever possible.

6. Use Aggregation Tables (EY Best Practice)

Instead of importing the entire 80M+ fact table:

Import a daily aggregated table
Use DirectQuery for detailed views
Combine both with composite models

Example Aggregated Table:


CREATE TABLE FactSalesAgg AS
SELECT
    DateID,
    ProductID,
    RegionID,
    SUM(Amount) AS TotalSales,
    SUM(Quantity) AS TotalQty
FROM FactSales
GROUP BY DateID, ProductID, RegionID;

7. Optimize Data Types

Use TinyInt, SmallInt instead of Int
Use Fixed Decimal instead of Float
Use Date instead of DateTime if possible
Use Boolean instead of Text flags

8. Avoid Calculated Columns in Power BI

Move ALL column logic into:

SQL views
ETL layer
Power Query M

Calculated columns consume RAM for every refresh.

9. Use Incremental Refresh

Import only the last 1–7 days of new data.

Store last 3–10 years of historical data in compressed partitions.

10. Use Selective Column Loading in Power Query

Only import columns referenced:

In visuals
In DAX
In relationships

11. Disable Auto Date/Time


Options → Time Intelligence → Disable Auto DateTime

Auto-generated date tables increase model size drastically.

12. Final Interview-Ready Answer (EY Style)

“To reduce memory footprint for 80M+ row import tables, I start by removing unnecessary columns, reducing cardinality, optimizing data types, and converting text to numeric keys. I design a proper star schema, move calculated column logic to SQL/ETL, use aggregation tables, implement incremental refresh, and disable Auto DateTime. These steps reduce memory usage by 40–80% and significantly speed up refresh performance — critical for EY-scale enterprise models.”

13. Conclusion

Optimizing Power BI’s memory footprint is essential for handling large datasets in EY enterprise analytics. By applying the techniques above — dimension simplification, cardinality reduction, star schema, aggregations, incremental refresh, and VertiPaq tuning — you can deliver fast, scalable models suitable for global audit and finance workloads.