SQL Query Optimization Techniques
Why efficient queries matter for performance, cost, and user experience

When I first moved a small side project from SQLite to PostgreSQL, I thought indexes were a magic wand. Add an index, problem solved. But a few months later, production traffic grew, a few “harmless” joins got nested, and a dashboard that used to load in a second started timing out. The database wasn’t slow; the queries were. That realization stuck with me. Good SQL is not just correct; it is considerate of the data, the hardware, and the people waiting on the other side of the screen.
Today, the need for thoughtful query design is even more acute. Applications are more data-hungry, serverless databases scale compute by the millisecond, and page performance remains a business metric. Whether you are building a REST API, a data pipeline, or an analytics dashboard, SQL optimization is a direct path to better user experience and lower bills. In this post, we’ll walk through practical techniques, real-world examples, and tradeoffs that I have used (and sometimes learned the hard way) to write SQL that is fast and maintainable.
Context: Where SQL optimization fits today
SQL remains the lingua franca of data access. It powers backends, ETL jobs, ML feature stores, and analytics tools. The relational model, combined with decades of optimizer research, makes SQL uniquely effective at expressing intent while letting the database figure out the best plan. In contrast, NoSQL and specialized analytical engines offer scale and speed for specific workloads, but they often require you to manage more complexity in application code.
In real-world projects, SQL optimization shows up everywhere:
- Web services: Endpoints that aggregate and filter data for dashboards or reports.
- Data pipelines: Batch jobs that join staging tables with historical data before loading into a warehouse.
- Operational analytics: Materialized views that power near-real-time KPIs.
The main players today are PostgreSQL, MySQL, SQL Server, and cloud options like Amazon Aurora and Google Cloud Spanner. Postgres, in particular, has a mature optimizer and excellent tooling for inspecting plans. If you are working in a polyglot environment, you will likely write SQL alongside application code in Python, Node.js, or Go. In those stacks, ORM abstractions are helpful but can obscure how the database actually executes queries, making plan reading and index strategy essential skills.
Compared to alternatives, SQL’s strength is declarative power: You say what you want, and the optimizer chooses how to get it. The tradeoff is that you still need to guide the optimizer with structure, indexes, and careful functions, or it can choose a plan that is correct but slow.
Core techniques and practical examples
Let’s break down the most effective optimization techniques with grounded examples. We will use PostgreSQL for the code, but the ideas apply broadly across relational databases.
H3: Use EXPLAIN to understand the plan
Before tuning, look at the execution plan. In Postgres, EXPLAIN (ANALYZE, BUFFERS) shows what the optimizer intends and what actually happened, including time and I/O.
EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, c.name, SUM(oi.quantity * oi.unit_price) AS total
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.order_date >= '2025-01-01'
GROUP BY o.id, c.name
ORDER BY total DESC
LIMIT 20;
This query is typical of an “orders overview” page. If you see a sequential scan on orders with a high row count, it’s a signal that filtering earlier or adding an index on order_date could help. The BUFFERS option is especially useful for diagnosing excessive shared hits or reads.
H3: Build the right indexes
Indexes are not a cure-all, but they are the most common lever. The best indexes align with your read patterns and filter predicates.
Case 1: Filtering by a date range and sorting by an aggregate.
-- Index to support range filtering on orders
CREATE INDEX idx_orders_date ON orders(order_date);
-- If you often filter by status too, consider a composite index
CREATE INDEX idx_orders_date_status ON orders(order_date, status);
Case 2: Supporting JOIN predicates and uniqueness.
-- Primary keys are automatically indexed; add indexes on foreign keys
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
-- Unique constraint for email lookups
CREATE UNIQUE INDEX idx_customers_email ON customers(email);
Case 3: Covering index for common projections (Postgres 11+ supports included columns, but we can emulate a covering index with composite keys).
-- If you always query customers by region and display name
CREATE INDEX idx_customers_region_name ON customers(region, name);
Keep in mind that indexes speed up reads but slow down writes. On write-heavy tables, prefer targeted indexes. For analytics workloads with large scans, columnar storage or materialized views may be better choices.
H3: Write sargable predicates
Sargable queries can leverage indexes. Non-sargable expressions force scans.
Instead of:
-- Non-sargable: function on the column prevents index use
SELECT * FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2025;
Prefer:
-- Sargable: direct range comparison
SELECT * FROM orders WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01';
The same applies to string operations. Lowercasing a column in the WHERE clause prevents index use unless you have a functional index.
-- Non-sargable
SELECT * FROM customers WHERE LOWER(email) = 'user@example.com';
-- Sargable with a functional index
CREATE INDEX idx_customers_lower_email ON customers(LOWER(email));
H3: Reduce data early with filtering and selective joins
The most expensive part of a query is usually the volume of data processed. Push filters down and avoid joining unnecessary rows.
Example: Filtering before joining reduces intermediate rows.
-- Before: Filtering after the join
SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.order_date >= '2025-01-01';
-- After: Explicitly filter the orders first (optimizer often does this, but being explicit helps)
WITH recent_orders AS (
SELECT id, customer_id
FROM orders
WHERE order_date >= '2025-01-01'
)
SELECT ro.id, c.name
FROM recent_orders ro
JOIN customers c ON c.id = ro.customer_id;
In nested loops or hash joins, fewer rows upstream means less work downstream. If your query planner isn’t pushing predicates, consider subqueries or CTEs to guide structure.
H3: Avoid unnecessary DISTINCT and GROUP BY
Sometimes DISTINCT is used to mask duplicate joins. Fix the join condition instead.
Example: Duplicate orders caused by joining to order_items without aggregation.
-- Problem: duplicates because one order has multiple items
SELECT DISTINCT o.id, o.total
FROM orders o
JOIN order_items oi ON oi.order_id = o.id;
-- Better: Be explicit about the aggregation
SELECT o.id, SUM(oi.quantity * oi.unit_price) AS total
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
GROUP BY o.id;
H3: Tune pagination for deep pages
Offset-based pagination is simple but slow for deep pages because the database must scan and discard rows.
Keyset pagination (also called cursor-based) is faster.
-- Offset pagination (slow for page 1000)
SELECT id, name, created_at
FROM users
ORDER BY created_at, id
LIMIT 50 OFFSET 50000;
-- Keyset pagination (fast)
SELECT id, name, created_at
FROM users
WHERE (created_at, id) > ('2024-12-01 00:00:00', 12345)
ORDER BY created_at, id
LIMIT 50;
Keyset pagination requires a stable order and an index that matches the ORDER BY. It also complicates jumping to arbitrary pages, which is an acceptable tradeoff in many API designs.
H3: Prefer standard SQL over vendor quirks when possible
Some functions or patterns can degrade performance across versions. For example, in MySQL prior to 8.0, index usage on certain functions is limited. In SQL Server, implicit conversions in predicates can cause index scans. Keep data types consistent and avoid mixing types in joins or filters.
Example: Avoid implicit cast by ensuring join keys share the same type.
-- If order_id is UUID in orders but text in order_items, cast explicitly in a migration
-- Instead of patching queries, fix the schema:
ALTER TABLE order_items ALTER COLUMN order_id TYPE UUID USING order_id::UUID;
H3: Use materialized views for expensive aggregates
When a dashboard recomputes the same aggregates repeatedly, materialized views can precompute results.
-- Create a materialized view for daily revenue by product
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
DATE_TRUNC('day', o.order_date) AS day,
p.id AS product_id,
p.name AS product_name,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
WHERE o.order_date >= '2024-01-01'
GROUP BY day, p.id, p.name;
-- Refresh the view on a schedule (e.g., hourly)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;
The CONCURRENTLY option in Postgres allows reads during refresh, though it requires a unique index on the view. For near-real-time needs, consider incremental updates instead.
H3: Batch operations and minimize round trips
In application code, N+1 queries are a classic performance killer. Fetch related data in a single query or use batching.
Python example: Using psycopg2 to batch fetch orders and their items.
import psycopg2
conn = psycopg2.connect("dbname=example user=postgres")
cur = conn.cursor()
# Instead of fetching orders and then querying items per order, fetch all at once
cur.execute("""
SELECT o.id, o.customer_id, oi.product_id, oi.quantity, oi.unit_price
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.order_date >= '2025-01-01'
""")
rows = cur.fetchall()
# Build in-memory structures
orders = {}
for order_id, customer_id, product_id, quantity, unit_price in rows:
orders.setdefault(order_id, []).append({
"product_id": product_id,
"quantity": quantity,
"unit_price": unit_price
})
# Now you can render JSON or compute totals without additional DB hits
In Node.js, use async patterns to fetch dependencies in parallel rather than sequentially.
const { Pool } = require('pg');
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
async function getOrdersWithItems(dateStart) {
const ordersPromise = pool.query(
'SELECT id, customer_id FROM orders WHERE order_date >= $1',
[dateStart]
);
const itemsPromise = pool.query(
'SELECT order_id, product_id, quantity, unit_price FROM order_items'
);
const [ordersRes, itemsRes] = await Promise.all([ordersPromise, itemsPromise]);
const itemsByOrder = {};
for (const row of itemsRes.rows) {
(itemsByOrder[row.order_id] ??= []).push(row);
}
return ordersRes.rows.map(o => ({
...o,
items: itemsByOrder[o.id] ?? []
}));
}
Honest evaluation: Strengths, weaknesses, and tradeoffs
SQL optimization is powerful, but not all problems are best solved by tuning SQL. Here is a balanced view.
Strengths:
- Declarative clarity: You express intent, and the optimizer chooses plans.
- Rich ecosystem: Mature tooling for profiling, indexing, and backups.
- Portability: Most relational databases speak SQL, and many optimization patterns carry over.
Weaknesses:
- Learning curve: Understanding plans and indexing strategies takes time.
- Not a silver bullet: Some workloads benefit from alternative storage models or streaming architectures.
- Vendor quirks: Differences in dialects, optimizer behavior, and feature sets can complicate porting.
When SQL optimization is a good fit:
- OLTP workloads with well-defined queries and transactional consistency.
- Mixed read-write patterns where indexes and plan guides offer predictable gains.
- Teams that can invest in observability and query review.
When to consider alternatives:
- Extremely high write throughput with simple queries (consider NoSQL or specialized stores).
- Event-driven systems where data arrives as streams (consider Kafka with materialized views).
- Large-scale analytical queries that benefit from columnar storage (consider a data warehouse or lakehouse).
Personal experience: Lessons from production
A few patterns stand out from years of working with SQL in production.
First, the biggest wins often come from reducing the working set. A query scanning millions of rows to return 20 is a red flag. Adding an index is good; redesigning the query to filter earlier is better. For dashboards, pre-aggregation through materialized views can turn seconds into milliseconds. The tradeoff is staleness, but that is often acceptable for hourly KPIs.
Second, pagination is a UX and performance problem. Offset pagination looks simple but causes unpredictable performance as pages grow. Keyset pagination is harder to implement for arbitrary jumps, but it provides stable performance and predictable resource usage. In practice, I’ve adopted keyset for infinite scroll UIs and offset only for admin panels where deep pages are rare.
Third, ORMs are great for productivity but can generate inefficient SQL. I have had success coupling ORMs with explicit SQL for complex reports. A pragmatic approach is to let the ORM handle CRUD and handwrite critical read paths. Reviewing query plans during code review helps keep performance honest.
Fourth, mistakes often involve implicit casts or mismatched types. A join between text and UUID columns can silently trigger a scan. Fixing the schema with migrations is better than rewriting queries. Data types matter for both correctness and performance.
Finally, optimization is iterative. Start with a baseline, add indexes judiciously, measure with EXPLAIN ANALYZE, and validate with real workloads. There is no substitute for production-like data and traffic when testing changes.
Getting started: Tooling, workflow, and mental models
Here is a minimal project layout for a service that relies on SQL. This is not prescriptive, but it reflects a workflow that has served me well.
app/
src/
db/
migrations/
001_create_orders.sql
002_create_order_items.sql
003_create_indexes.sql
queries/
orders_list.sql
orders_details.sql
seed/
seed_data.sql
api/
orders.py
products.py
tests/
test_orders_queries.py
scripts/
explain.py
Dockerfile
docker-compose.yml
requirements.txt
Tooling:
- psql or a graphical client (like DBeaver) for exploring plans.
- pg_stat_statements for identifying the most expensive queries over time.
- A migration tool such as Flyway or Alembic for schema changes.
- A simple script to run EXPLAIN on query files.
Example explain script (Python using psycopg2):
import sys
import psycopg2
def explain_query_file(path, conn_str):
with open(path, 'r') as f:
sql = f.read()
conn = psycopg2.connect(conn_str)
cur = conn.cursor()
cur.execute(f"EXPLAIN (ANALYZE, BUFFERS) {sql}")
for row in cur.fetchall():
print(row[0])
if __name__ == "__main__":
explain_query_file(sys.argv[1], sys.argv[2])
Mental model:
- Think in sets. SQL describes transformations on sets of rows.
- Filter early and often. The earlier you reduce rows, the less downstream work.
- Index for the path you take most. Indexes are promises you make to future writes.
- Treat query plans like compiler output. Read them with intent, not guesswork.
Free learning resources
- PostgreSQL EXPLAIN documentation: https://www.postgresql.org/docs/current/using-explain.html
- Use The Index, Luke: https://use-the-index-luke.com/
- MySQL 8.0 Reference Manual on Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
- SQL Server Execution Plans: https://learn.microsoft.com/en-us/sql/relational-databases/performance/execution-plans
- pg_stat_statements documentation: https://www.postgresql.org/docs/current/pgstatstatements.html
- PostgreSQL Wiki on Index Types: https://wiki.postgresql.org/wiki/Index_kinds
These resources provide practical depth without fluff. The PostgreSQL EXPLAIN guide is especially useful for translating plan nodes into actionable changes.
Who should use these techniques and who might skip them
Use SQL optimization techniques if:
- You are building applications that rely heavily on relational data.
- You want predictable performance for queries that run millions of times a day.
- You care about lowering infrastructure costs and improving user experience.
Consider skipping or deferring deep optimization if:
- Your data volumes are tiny and will remain so for the foreseeable future.
- You are prototyping and the schema is in flux; premature indexing can create churn.
- Your workload is primarily event streaming with minimal joins; you may benefit more from stream processing patterns.
Summary
SQL query optimization is not about chasing micro-optimizations. It is about aligning your queries with how databases work: indexes, join strategies, and plan selection. The most impactful changes often come from small design shifts, like writing sargable predicates, filtering early, choosing the right pagination strategy, and using materialized views for expensive aggregates. With the right tools and a habit of reading execution plans, you can deliver fast, reliable data access without overcomplicating your stack.
If you are new to optimization, start with a single slow query and one change. Measure, compare plans, and document the outcome. Over time, this practice compounds into a well-tuned system that feels responsive to users and cost-effective to run.




