How to Optimize SQL Queries for Millions of Rows (Step-by-Step Guide with Examples)

Introduction

Handling a few thousand rows in SQL is simple. But what happens when your database tables grow to millions (or even billions) of rows? Queries that once ran in milliseconds now take seconds or even minutes. This not only slows applications but also increases costs on cloud services like AWS, GCP, or Azure.

In this article, we’ll explore proven techniques to optimize SQL queries for millions of rows with real examples in MySQL, PostgreSQL, and Oracle.


Why Queries Slow Down with Big Data

When rows grow into millions, databases face challenges:

  • Full table scans → too much data to read.
  • No proper indexing → every query searches row by row.
  • Poor joins → exploding intermediate results.
  • Unoptimized filtering → scanning unnecessary data.

Let’s fix them step by step.


Step 1: Always Use Indexes (The Right Way)

Without Index (Slow):

SELECT * FROM sales WHERE customer_id = 12345;

Execution → Full scan of millions of rows.

With Index (Fast):

CREATE INDEX idx_sales_customer ON sales(customer_id);

SELECT order_id, order_date, amount 
FROM sales
WHERE customer_id = 12345;

Execution → Database jumps directly to the row range.

💡 Pro Tip: Use composite indexes if filtering by multiple columns (e.g., customer_id, order_date).


Step 2: Avoid SELECT * (Reduce I/O)

Bad (Slow):

SELECT * FROM transactions WHERE status = 'Completed';

Optimized (Fast):

SELECT transaction_id, amount, created_at
FROM transactions
WHERE status = 'Completed';

✅ Saves disk I/O → faster query time.


Step 3: Partition Large Tables

Partitioning splits massive tables into smaller, manageable parts.

Example in PostgreSQL:

CREATE TABLE orders (
    id SERIAL,
    order_date DATE NOT NULL,
    amount NUMERIC
) PARTITION BY RANGE (order_date);

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-12-31');

✅ Queries only scan relevant partitions, not the entire dataset.


Step 4: Use EXPLAIN Plans

Before optimizing, analyze the query execution plan.

MySQL Example:

EXPLAIN SELECT * FROM customers WHERE email = '[email protected]';

It shows whether the query uses an index or a full table scan.


Step 5: Optimize Joins

Inefficient Join (Slow):

SELECT *
FROM orders o, customers c
WHERE o.customer_id = c.id
AND c.country = 'US';

Optimized Join (Fast):

SELECT o.order_id, o.amount, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'US';

✅ Explicit JOINs let the database optimize execution paths.


Step 6: Use LIMIT and Pagination

When displaying results (like in dashboards), avoid fetching millions at once.

Bad (Slow):

SELECT * FROM logs ORDER BY created_at DESC;

Optimized (Fast):

SELECT log_id, message, created_at
FROM logs
ORDER BY created_at DESC
LIMIT 100;

✅ Always fetch only what’s needed.


Step 7: Batch Processing

Instead of updating rows one by one, batch them.

Bad (Loop Updates):

UPDATE inventory SET stock = stock - 1 WHERE item_id = 101;
UPDATE inventory SET stock = stock - 1 WHERE item_id = 102;

Optimized (Batch Update):

UPDATE inventory
SET stock = stock - 1
WHERE item_id IN (101, 102, 103, 104);

✅ Reduces query executions dramatically.


Real-World Benchmark Example

Dataset: 10 million sales records
Original Query:

SELECT * FROM sales WHERE amount > 1000;

⏱ Execution Time: 12.5 seconds

Optimized Query (Index + Filter):

CREATE INDEX idx_sales_amount ON sales(amount);

SELECT sale_id, customer_id, amount
FROM sales
WHERE amount > 1000;

⏱ Execution Time: 0.9 seconds 🚀


Advanced Tips for Huge Databases

  • Materialized Views: Store results of expensive queries for reuse.
  • Query Caching: Cache frequent queries in Redis/Memcached.
  • Denormalization: Sometimes duplicating data speeds reads.
  • Sharding: Split data across multiple servers.

Best Practices Checklist

✅ Add proper indexes
✅ Avoid SELECT *
✅ Use partitioning on large tables
✅ Analyze execution plans
✅ Optimize joins
✅ Paginate results
✅ Batch updates & inserts


Conclusion

When dealing with millions of rows, SQL query optimization is not optional—it’s a must. By applying indexes, partitioning, execution analysis, and batching techniques, you can reduce query times from seconds to milliseconds.

👉 Try these techniques on your database today and experience the difference. For more SQL tutorials and real-world optimization guides, explore other articles on SQLQueries.in.