- 1 What Is the Volcano Model?
- 2 The Iterator Interface: Next() / GetNext()
- 3 Operator Trees: How Queries Become Execution Plans
- 4 Row-by-Row Processing: The Good, The Bad, and The Slow
- 5 PostgreSQL’s Implementation: ExecProcNode
- 6 Performance Implications: When Volcano Shines vs. Struggles
- 7 Vectorized Execution: The Alternative
- 8 Hybrid Approaches: Getting the Best of Both
- 9 Practical Takeaways for PostgreSQL Users
- 10 The Future: Is Vectorization Coming to PostgreSQL?
- Summary: The Volcano Model in One Diagram
Every SQL query you run in PostgreSQL—whether a simple SELECT * FROM users or a 20-table join with window functions—executes through the same elegant mechanism: the Volcano Model (also known as the Iterator Model).
This 1980s architecture is why PostgreSQL can:
- Stream gigabytes of results without loading everything into memory
- Stop early on
LIMITqueries without processing all rows - Chain operators flexibly without custom code for each combination
But it’s also why PostgreSQL struggles with certain analytical workloads where vectorized execution shines. Here’s the deep dive: how the Volcano Model works, why PostgreSQL chose it, and where it breaks down.
1 What Is the Volcano Model?
The Volcano Model is an execution architecture where queries are represented as trees of operators, each exposing a standard interface: Next() (or GetNext() in PostgreSQL’s codebase).
Each operator:
- Requests rows from its child(ren) by calling
Next() - Processes one row at a time
- Returns one row to its parent
This pull-based model means data flows up the tree, one row per call, until the top node returns results to the client.
Is It an Architecture? A Pattern? Something Else?
The Volcano Model is often described using different terms. Here’s the precise classification:
| Term | Is Volcano This? | Why |
|---|---|---|
| Execution Model | ✅ Most accurate | Defines how computation proceeds (row-by-row, pull-based) |
| Architectural Pattern | ✅ Also correct | Defines high-level structure (operator trees with standard interface) |
| Design Pattern | ⚠️ Partially | Built on top of the Iterator Pattern, but more than just one pattern |
| Software Architecture | ❌ Too broad | It’s part of a database’s architecture, not the whole architecture |
| Algorithm | ❌ No | It’s a structural framework, not a specific computational procedure |
The Relationship:
Iterator Pattern (GoF Design Pattern)
↓
Used by
↓
Volcano Model (Architectural Pattern / Execution Model)
↓
Implemented in
↓
PostgreSQL Executor (Software Architecture)
Why the Confusion?
| Source | Uses Term | Reason |
|---|---|---|
| Academic papers | “Execution Model” | Focuses on computational semantics |
| Database vendors | “Architecture” | Marketing; sounds more substantial |
| Software engineers | “Pattern” | Familiar from design pattern vocabulary |
| PostgreSQL docs | “Executor” | Implementation-focused naming |
The Precise Answer:
The Volcano Model is best described as an architectural pattern for query execution that:
- Uses the Iterator Pattern as its foundation
- Defines an execution model (pull-based, row-at-a-time)
- Is part of a database’s overall software architecture
Think of it like this:
- Iterator Pattern = “How do I traverse a collection?”
- Volcano Model = “How do I compose operators to execute a query?”
- PostgreSQL Executor = “The actual code that implements Volcano”
Simple Example:
SELECT name FROM users WHERE age > 25;
Executes as:
Execution Flow:
Client: "Give me a row"
↓
Projection: "Here's a row" (calls Filter.Next())
↓
Filter: "Here's a filtered row" (calls SeqScan.Next())
↓
SeqScan: "Here's a raw row from disk"
Each operator is independent. The Filter doesn’t know if data comes from a Seq Scan, Index Scan, or Join. The Projection doesn’t know if data is filtered or raw. This modularity is the Volcano Model’s superpower.
2 The Iterator Interface: Next() / GetNext()
In PostgreSQL’s codebase, every executor node implements the same core interface:
/* Simplified from postgres/src/include/nodes/execnodes.h */
typedef struct PlanState {
/* ... state fields ... */
} PlanState;
/* Every node type implements this pattern */
static inline TupleTableSlot *
ExecProcNode(PlanState *node)
{
if (node->is_done)
return NULL; /* No more rows */
/* Node-specific logic */
return node->next_row;
}
The Contract:
| Return Value | Meaning |
|---|---|
Valid TupleTableSlot |
One row of data |
NULL |
No more rows (end of stream) |
Generic Execution Loop:
/* Pseudocode—PostgreSQL's actual executor */
while (true) {
TupleTableSlot *slot = ExecProcNode(top_node);
if (TupIsNull(slot))
break; /* No more rows */
/* Process the row (send to client, aggregate, etc.) */
send_to_client(slot);
}
This loop—call Next(), process row, repeat—is the entire Volcano Model. Every query, no matter how complex, reduces to this pattern.
3 Operator Trees: How Queries Become Execution Plans
When you run a query, PostgreSQL’s planner builds an operator tree. Each node is an executor type with specific logic.
Common Operator Types
| Operator | What It Does | Calls Child How Many Times? |
|---|---|---|
| Seq Scan | Reads table pages from disk | N/A (leaf node) |
| Index Scan | Reads index, fetches heap tuples | N/A (leaf node) |
| Filter | Applies WHERE clause | 1+ (until row passes filter) |
| Projection | Selects/computes columns | 1 |
| Nested Loop Join | For each outer row, scan inner | 1 outer + N inner |
| Hash Join | Build hash table, probe | N (build phase) + N (probe phase) |
| Merge Join | Merge-sorted inputs | 1 from each sorted input |
| Aggregate | Groups and computes aggregates | N (until group complete) |
| Sort | Sorts input, returns in order | N (buffer all, then return) |
| Limit | Stops after N rows | N (passes through) |
Example: Complex Query
SELECT
u.name,
COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 10;
Execution Plan (Simplified):
Execution Flow (First Row):
1. Client calls Limit.Next()
2. Limit calls Sort.Next()
3. Sort buffers ALL input (calls Aggregate.Next() until NULL)
4. Aggregate calls HAVING Filter.Next() for each group
5. HAVING Filter calls Aggregate.Next() (which consumed the join)
6. Hash Join builds hash table from orders, then probes with users
7. Users Seq Scan reads rows, Filter applies created_at > 2025-01-01
8. Sort returns first row (highest order_count)
9. Limit returns it to client
Notice: Sort must consume all input before returning anything. This is a blocking operator—it breaks the pure streaming model.
🤔 Why Does This Matter?
Blocking operators like Sort, Hash Aggregate, and Hash Join (build phase) force PostgreSQL to buffer data before producing results. This means:
- Memory pressure — Must fit in
work_memor spill to disk - No early termination — Can't stop early even with
LIMIT - Latency impact — First row takes longer to return
When you see these in EXPLAIN ANALYZE, ask: "Can I reduce the input size before this operator?"
4 Row-by-Row Processing: The Good, The Bad, and The Slow
The Good: Why Volcano Works Well
1. Memory Efficiency
Non-blocking operators stream rows without buffering:
SELECT name FROM users WHERE age > 25 LIMIT 10;
PostgreSQL can stop after finding 10 matching rows—no need to scan the entire table if matches are found early.
Memory Usage: O(1) per operator (just current row state)
2. Modularity
Operators compose freely. The same Filter works with:
- Seq Scan
- Index Scan
- Any Join type
- Subquery results
No custom code needed for each combination.
3. Early Termination
EXISTS (SELECT 1 FROM orders WHERE user_id = 42)
Stops at the first matching row. No need to find all matches.
4. Simple Implementation
Each operator is a self-contained function:
TupleTableSlot *
ExecFilter(FilterState *state)
{
while (true) {
TupleTableSlot *slot = ExecProcNode(outer_plan(state));
if (TupIsNull(slot))
return NULL;
if (passes_qual(slot, state->qual))
return slot;
/* Row doesn't match—try next */
}
}
~30 lines of code. Easy to reason about. Easy to debug.
💡 Key Insight: Simplicity Enables Extensibility
Because each operator is so simple (~30-100 lines), PostgreSQL can add new operator types without rewriting the entire executor. This is why extensions can add custom scan methods, join types, and aggregation strategies. The Volcano Model's uniform interface is what makes PostgreSQL extensible.
The Bad: Where Volcano Struggles
1. Function Call Overhead
Every row requires:
Next()call to childNext()call from parent- Virtual function dispatch (in some implementations)
For 1 million rows: 2 million function calls just for plumbing.
2. No Vectorization
Modern CPUs excel at SIMD (Single Instruction, Multiple Data):
Scalar (Volcano): Process row 1, then row 2, then row 3...
Vectorized: Process rows 1-1024 in parallel
Volcano’s row-by-row model can’t exploit SIMD because:
- Each row is processed independently
- No batch context for vectorization
- State is per-row, not per-batch
3. Cache Inefficiency
/* Volcano: scattered memory access */
while (row = Next()) {
process(row->col1); /* May be in different cache line */
process(row->col2); /* Another cache miss */
process(row->col3); /* Another cache miss */
}
Columnar/vectorized engines process all values of one column together:
/* Vectorized: sequential memory access */
for (batch : batches) {
process(batch.col1[0..1023]); /* Sequential—cache friendly */
process(batch.col2[0..1023]); /* Sequential—cache friendly */
}
4. Blocking Operators Break Streaming
Some operators must consume all input before producing output:
| Blocking Operator | Why It Blocks |
|---|---|
| Sort | Must see all rows to determine order |
| Hash Aggregate | Must see all rows in a group before computing aggregate |
| Hash Join (build phase) | Must build entire hash table before probing |
| Distinct | Must see all rows to eliminate duplicates |
When a blocking operator is in the plan, upstream operators can’t stream—they must buffer.
5 PostgreSQL’s Implementation: ExecProcNode
In PostgreSQL’s source code, the Volcano interface is ExecProcNode():
/* Simplified from src/include/nodes/execnodes.h */
static inline TupleTableSlot *
ExecProcNode(PlanState *node)
{
if (node->is_done)
return NULL;
/* Dispatch to node-specific function */
return node->ExecProcNode(node);
}
Each node type implements its own ExecProcNode:
| Node Type | Implementation Function |
|---|---|
| Seq Scan | ExecSeqScan() |
| Index Scan | ExecIndexScan() |
| Hash Join | ExecHashJoin() |
| Aggregate | ExecAggregate() |
| Sort | ExecSort() |
| Filter | ExecFilter() |
Example: Filter Node
/* Simplified from src/backend/executor/nodeFilter.c */
TupleTableSlot *
ExecFilter(FilterState *node)
{
ExprContext *econtext = node->ps.ps_ExprContext;
ExprState *qual = node->filterqual;
for (;;) {
/* Get tuple from outer plan */
TupleTableSlot *slot = ExecProcNode(outerPlan(node));
/* No more rows? */
if (TupIsNull(slot))
return NULL;
/* Set up expression context */
econtext->ecxt_outertuple = slot;
/* Check qualification */
if (ExecQual(qual, econtext))
return slot; /* Passes filter—return it */
/* Doesn't pass—loop and try next row */
InstrCountFiltered1(node, 1); /* Stats tracking */
}
}
Key Observations:
- Infinite loop until a row passes or no more rows
- Single row in, single row out
- Stateless between calls (except for stats)
- Delegates to child via
ExecProcNode(outerPlan(node))
This pattern repeats across ~50 executor node types.
6 Performance Implications: When Volcano Shines vs. Struggles
Volcano Excels At:
| Workload | Why |
|---|---|
| OLTP (short, selective queries) | Few rows processed; function call overhead negligible |
| Index scans with LIMIT | Early termination; minimal I/O |
| Streaming large results | No buffering; constant memory |
| Complex joins with selective filters | Filters reduce rows before expensive joins |
Example: OLTP Query
SELECT * FROM orders
WHERE user_id = 12345
AND status = 'pending'
LIMIT 1;
- Index Scan finds matching row in ~3 I/O operations
- Filter applies
status = 'pending' - Limit stops after first match
- Total rows processed: 1-5
- Volcano overhead: Negligible
Volcano Struggles At:
| Workload | Why |
|---|---|
| Analytical (scan millions of rows) | Function call overhead dominates |
| Columnar access patterns | Row-by-row prevents column vectorization |
| Bulk aggregations | No batch processing for SIMD |
| Full table scans with simple filters | 90% of time in Next() plumbing |
Example: Analytical Query
SELECT
DATE_TRUNC('month', created_at) as month,
SUM(amount) as total
FROM transactions
WHERE created_at >= '2020-01-01'
GROUP BY month;
- Scans 100 million rows
- Each row:
Next()call, filter check, date truncation, aggregation - Function calls: 200+ million (2 per row)
- Volcano overhead: 20-40% of total time
⚠️ The Hidden Cost: Not Just Function Calls
The function call overhead is only part of the problem. Row-by-row processing also means:
- Branch misprediction — CPU can't predict which path each row takes
- SIMD underutilization — Modern CPUs can process 4-8 values in parallel, but Volcano uses scalar operations
- Cache thrashing — Each row may touch different memory locations
For analytical queries, these CPU-level inefficiencies often matter more than the function call count itself.
7 Vectorized Execution: The Alternative
Vectorized engines (ClickHouse, DuckDB, Snowflake) process rows in batches (typically 1024-8192 rows):
/* Vectorized interface */
struct VectorBatch {
int32_t col1[1024];
int32_t col2[1024];
bool nulls[1024];
int count; /* Actual rows in batch */
};
VectorBatch* NextBatch(Operator* op);
Execution:
while (batch = NextBatch()) {
/* Process all 1024 rows at once */
for (int i = 0; i < batch->count; i++) {
if (!batch->nulls[i] && batch->col1[i] > 25) {
result->col1[result->count++] = batch->col1[i];
}
}
}
Benefits:
| Aspect | Volcano (Row) | Vectorized (Batch) |
|---|---|---|
| Function calls per row | 2+ | 2 / batch_size (~0.002) |
| SIMD utilization | None | High (process 4-8 values per instruction) |
| Cache efficiency | Poor (row layout) | High (column layout) |
| Code complexity | Low | High (batch management, vectorization) |
🤔 So Why Doesn't PostgreSQL Switch?
If vectorization is 5-10x faster for analytics, why not adopt it?
- Backward compatibility — Extensions rely on the current
ExecProcNode()API - Code complexity — Rewriting 50+ node types is a massive undertaking
- OLTP trade-off — Vectorization helps analytics but can hurt OLTP latency
- Philosophy — PostgreSQL prefers stable, incremental improvements
The answer: Extensions, not core changes. See the next section.
Why PostgreSQL Doesn’t Use Vectorization (Yet)
1. Historical Reasons
PostgreSQL’s executor was designed in the 1980s-1990s, before SIMD and columnar storage were mainstream.
2. Architecture Coupling
Vectorization requires:
- Columnar storage (or row-to-column conversion)
- Batch-aware operators (rewriting ~50 node types)
- Vectorized expression evaluation (rewriting expression engine)
3. OLTP Focus
PostgreSQL optimizes for mixed workloads, not just analytics.
4. Extension Approach
Instead of rewriting the core, PostgreSQL supports extensions:
- Columnar storage: Citus Columnar, Hydra
- Vectorized execution: Experimental patches (not merged)
8 Hybrid Approaches: Getting the Best of Both
Some databases blend Volcano with vectorization:
Apache Spark
- Volcano-style iterator interface
- Vectorized readers (Parquet, ORC)
- Whole-stage code generation (fuses operators, eliminates
Next()calls)
DuckDB
- Vectorized execution as default
- Volcano-style operator trees
- Chunk-based processing (2048 rows per chunk)
PostgreSQL (Current)
- Volcano model throughout
- Just-in-Time compilation (LLVM) for expression evaluation
- Parallel query for some operations (parallel Seq Scan, Hash Join, Aggregate)
JIT Compilation Example:
SET jit = on;
SELECT sum(amount * 1.15) /* Expression compiled to native code */
FROM transactions
WHERE created_at >= '2025-01-01';
PostgreSQL compiles the expression amount * 1.15 to native machine code, reducing interpretation overhead. This doesn’t fix Volcano’s row-by-row model, but it helps.
9 Practical Takeaways for PostgreSQL Users
When Volcano Works Well:
/* ✅ Good: Selective index scan */
SELECT * FROM users WHERE email = 'user@example.com';
/* ✅ Good: Early termination */
SELECT EXISTS (SELECT 1 FROM orders WHERE user_id = 42);
/* ✅ Good: Streaming with LIMIT */
SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100;
/* ✅ Good: Pipelined joins */
SELECT * FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.country = 'US'; /* Filter reduces join input */
When Volcano Struggles:
/* ⚠️ Poor: Full table scan with aggregation */
SELECT COUNT(*) FROM transactions; /* Must scan all rows */
/* ⚠️ Poor: Complex expression on every row */
SELECT UPPER(CONCAT(first_name, ' ', last_name)) FROM users;
/* ⚠️ Poor: Blocking operators on large datasets */
SELECT DISTINCT category FROM products; /* Must buffer all */
/* ⚠️ Poor: Analytical window functions */
SELECT
user_id,
AVG(amount) OVER (PARTITION BY user_id ORDER BY date)
FROM transactions;
Optimization Strategies:
1. Push Filters Down
/* ❌ Bad: Filter after join */
SELECT * FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01';
/* ✅ Good: Filter before join (PostgreSQL does this automatically) */
/* Planner pushes filter down to users scan */
💡 Trust But Verify
PostgreSQL's planner is smart about pushing filters down, but not always optimal. Use EXPLAIN (ANALYZE, BUFFERS) to verify:
- Filter appears before expensive operations (Join, Sort, Aggregate)
Rows Removed by Filteris reasonable (not scanning millions just to filter out 99%)
If the planner gets it wrong, try:
- CTEs (in PG14+, they act as optimization fences)
- Subqueries to force evaluation order
- Partial indexes to make filtered scans cheaper
2. Use Indexes to Reduce Rows
/* ❌ Bad: Seq Scan on large table */
SELECT * FROM transactions WHERE user_id = 42;
/* ✅ Good: Index Scan */
CREATE INDEX ON transactions(user_id);
3. Avoid Unnecessary Blocking Operators
/* ❌ Bad: DISTINCT on large result */
SELECT DISTINCT category FROM million_row_table;
/* ✅ Good: GROUP BY (same result, clearer intent) */
SELECT category FROM million_row_table GROUP BY category;
4. Leverage Parallel Query
/* Enable parallel query */
SET max_parallel_workers_per_gather = 4;
/* Parallel Seq Scan + Hash Aggregate */
SELECT COUNT(*) FROM large_table;
10 The Future: Is Vectorization Coming to PostgreSQL?
Short answer: Not to the core, but extensions are experimenting.
Current Efforts:
| Project | Approach | Status |
|---|---|---|
| Citus Columnar | Columnar storage with row-by-row execution | Production |
| Hydra | Columnar storage for analytics | Production |
| pg_vectorize | Vectorized expression evaluation | Experimental |
| LLVM JIT | Compiled expressions (not vectorized) | Production (PG11+) |
Why Core Won’t Change Soon:
- Backward compatibility — Extensions rely on current executor API
- Complexity — Rewriting 50+ node types is a massive undertaking
- Trade-offs — Vectorization helps analytics, hurts OLTP latency
- Philosophy — PostgreSQL prefers stable, incremental improvements
Most likely path: Vectorization via extensions, not core changes.
📌 What This Means for You
Don't wait for vectorization in PostgreSQL core. Instead:
- For OLTP: Volcano works great—focus on indexing and query design
- For Analytics: Use extensions (Citus Columnar, Hydra) or purpose-built tools (DuckDB, ClickHouse)
- For Mixed Workloads: Leverage JIT compilation and parallel query
The right tool depends on your query patterns, not just raw performance benchmarks.
Summary: The Volcano Model in One Diagram
Key Takeaways:
| Aspect | Volcano Model |
|---|---|
| Interface | Next() / GetNext() — one row at a time |
| Structure | Tree of operators |
| Data flow | Pull-based (child → parent) |
| Memory | O(1) for streaming, O(N) for blocking operators |
| Best for | OLTP, selective queries, streaming |
| Worst for | Analytical, full scans, bulk aggregations |
| PostgreSQL status | Core execution model since inception |
The Volcano Model isn’t perfect—but after 40+ years, it’s still the foundation of most SQL databases. Understanding it helps you write queries that work with PostgreSQL’s architecture, not against it.
✅ Key Takeaway
The Volcano Model is a trade-off, not a mistake:
- Gains: Simplicity, extensibility, streaming, early termination
- Losses: Function call overhead, no SIMD, cache inefficiency
For OLTP and mixed workloads, the gains outweigh the losses. For pure analytics, consider columnar/vectorized alternatives.
Your job as a PostgreSQL user: Know which queries play to Volcano's strengths—and which ones fight against it.
Further Reading:
- Graefe, Goetz. “Volcano—An Extensible and Parallel Query Evaluation System” (1994) — The original paper
- PostgreSQL Source:
src/backend/executor/— Actual implementation - “PostgreSQL Internals” by Egor Rogov — Deep dive into executor architecture
- DuckDB Documentation: “Vectorized Execution” — Alternative approach