- 1 Why Histograms Matter
- 2 Histogram Types
- 3 Sampling for Large Tables
- 4 Multi-Column Statistics
- 5 Using Histograms in the Optimizer
- 6 Histogram Maintenance
- 7 Challenges Building in Rust
- 8 How AI Accelerated This
- Summary: Histogram Statistics in One Diagram
In Part 7, we built a cost-based optimizer. But thereโs a problem.
Our selectivity estimates are guesses:
// From Part 7 - simplified (wrong!) estimates
fn estimate_selectivity(&self, expr: &Expression) -> f64 {
match expr {
BinaryOperator::Eq => 0.01, // Always 1%? Wrong!
BinaryOperator::Gt => 0.33, // Always 33%? Wrong!
_ => 0.5, // Always 50%? Very wrong!
}
}
Real data isnโt uniform:
User balances in our database:
Balance Distribution:
$0-100: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 80% of users
$100-1K: โโโโโโโโ 15% of users
$1K-10K: โโ 4% of users
$10K-100K: โ 1% of users
Query: SELECT * FROM users WHERE balance > 100
Our estimate: 33% of rows (using fixed 0.33)
Actual: 20% of rows
โ Wrong plan chosen! Index scan would be better than seq scan.
Solution: Histograms that capture actual data distribution.
Today: implementing histogram statistics in Rust for accurate selectivity estimation.
1 Why Histograms Matter
The Uniform Distribution Fallacy
Without histograms, optimizers assume uniform distribution:
-- Table: users (1,000,000 rows)
-- Column: account_type ('free', 'premium', 'enterprise')
-- Reality:
'free': 950,000 rows (95%)
'premium': 45,000 rows (4.5%)
'enterprise': 5,000 rows (0.5%)
-- Query 1:
SELECT * FROM users WHERE account_type = 'enterprise'
-- Optimizer estimate (uniform): 1,000,000 / 3 = 333,333 rows
-- Actual: 5,000 rows
-- Error: 67x overestimate!
-- Query 2:
SELECT * FROM users WHERE account_type = 'free'
-- Optimizer estimate (uniform): 333,333 rows
-- Actual: 950,000 rows
-- Error: 3x underestimate!
With histograms: We know the actual distribution.
Impact on Join Order
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.account_type = 'enterprise'
Without histogram:
Optimizer thinks 'enterprise' = 333,333 rows
Plan: SeqScan(users) โ Hash Join โ SeqScan(orders)
Cost: 5000 (wrong!)
With histogram:
Histogram shows 'enterprise' = 5,000 rows
Plan: IndexScan(users) โ Nested Loop โ IndexScan(orders)
Cost: 100 (correct!)
Result: 50x faster query.
2 Histogram Types
Equi-Width Histograms
Equal bucket ranges, varying row counts:
Data: [1, 2, 3, 4, 5, 10, 11, 12, 13, 14]
Equi-Width (3 buckets, range 1-14):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Bucket 1: [1-5] โ 5 rows โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Bucket 2: [6-10] โ 1 row โโโโโโโ โ
โ Bucket 3: [11-14] โ 4 rows โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#[derive(Debug, Clone)]
pub struct EquiWidthHistogram {
pub min_value: Value,
pub max_value: Value,
pub num_buckets: usize,
pub buckets: Vec<WidthBucket>,
}
#[derive(Debug, Clone)]
pub struct WidthBucket {
pub lower_bound: Value,
pub upper_bound: Value,
pub row_count: u64,
pub distinct_count: u64,
pub null_count: u64,
}
impl EquiWidthHistogram {
pub fn build(values: &[Value], num_buckets: usize) -> Result<Self, HistogramError> {
if values.is_empty() {
return Err(HistogramError::EmptyData);
}
let (min, max) = Self::find_min_max(values)?;
let bucket_width = max.subtract(&min).divide(num_buckets as f64);
let mut buckets = vec![WidthBucket {
lower_bound: min.clone(),
upper_bound: min.clone(),
row_count: 0,
distinct_count: 0,
null_count: 0,
}; num_buckets];
// Initialize bucket bounds
for i in 0..num_buckets {
buckets[i].lower_bound = min.add(bucket_width.multiply(i as f64));
buckets[i].upper_bound = min.add(bucket_width.multiply((i + 1) as f64));
}
// Count rows in each bucket
for value in values {
if value.is_null() {
buckets[0].null_count += 1;
continue;
}
let bucket_idx = Self::find_bucket_index(value, &min, &bucket_width, num_buckets);
if bucket_idx < num_buckets {
buckets[bucket_idx].row_count += 1;
}
}
Ok(Self {
min_value: min,
max_value: max,
num_buckets,
buckets,
})
}
fn find_bucket_index(value: &Value, min: &Value, width: &Value, num_buckets: usize) -> usize {
let offset = value.subtract(min);
let bucket = offset.divide(width).to_f64().floor() as usize;
bucket.min(num_buckets - 1)
}
}
Pros: Simple, good for uniform data.
Cons: Poor for skewed data (most buckets empty, one bucket huge).
Equi-Depth Histograms (PostgreSQL Style)
Equal row counts per bucket, varying ranges:
Data: [1, 2, 3, 4, 5, 10, 11, 12, 13, 14]
Equi-Depth (3 buckets, ~3-4 rows each):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Bucket 1: [1-4] โ 4 rows โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Bucket 2: [5-11] โ 3 rows โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Bucket 3: [12-14] โ 3 rows โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#[derive(Debug, Clone)]
pub struct EquiDepthHistogram {
pub num_buckets: usize,
pub total_rows: u64,
pub buckets: Vec<DepthBucket>,
}
#[derive(Debug, Clone)]
pub struct DepthBucket {
pub lower_bound: Value,
pub upper_bound: Value,
pub row_count: u64,
pub distinct_count: u64,
pub cumulative_count: u64, // Rows <= upper_bound
}
impl EquiDepthHistogram {
pub fn build(values: &[Value], num_buckets: usize) -> Result<Self, HistogramError> {
if values.is_empty() {
return Err(HistogramError::EmptyData);
}
// Sort values
let mut sorted_values = values.to_vec();
sorted_values.sort();
let total_rows = sorted_values.len() as u64;
let target_bucket_size = (total_rows as f64 / num_buckets as f64).ceil() as usize;
let mut buckets = Vec::new();
let mut cumulative = 0u64;
for i in 0..num_buckets {
let start = i * target_bucket_size;
let end = ((i + 1) * target_bucket_size).min(sorted_values.len());
if start >= sorted_values.len() {
break;
}
let bucket_values = &sorted_values[start..end];
let row_count = bucket_values.len() as u64;
let distinct_count = bucket_values.iter().collect::<std::collections::HashSet<_>>().len() as u64;
cumulative += row_count;
buckets.push(DepthBucket {
lower_bound: bucket_values.first().unwrap().clone(),
upper_bound: bucket_values.last().unwrap().clone(),
row_count,
distinct_count,
cumulative_count: cumulative,
});
}
Ok(Self {
num_buckets,
total_rows,
buckets,
})
}
/// Estimate selectivity for equality predicate
pub fn estimate_eq(&self, value: &Value) -> f64 {
for bucket in &self.buckets {
if value >= &bucket.lower_bound && value <= &bucket.upper_bound {
// Assume uniform distribution within bucket
let bucket_selectivity = 1.0 / bucket.distinct_count.max(1) as f64;
return bucket_selectivity * (bucket.row_count as f64 / self.total_rows as f64);
}
}
0.0 // Value outside histogram range
}
/// Estimate selectivity for range predicate (value > X)
pub fn estimate_gt(&self, value: &Value) -> f64 {
let mut selectivity = 0.0;
for bucket in &self.buckets {
if value < &bucket.lower_bound {
// Entire bucket matches
selectivity += bucket.row_count as f64 / self.total_rows as f64;
} else if value >= &bucket.lower_bound && value < &bucket.upper_bound {
// Partial bucket - assume uniform within bucket
let bucket_range = bucket.upper_bound.subtract(&bucket.lower_bound).to_f64();
let matching_range = bucket.upper_bound.subtract(value).to_f64();
let fraction = (matching_range / bucket_range).clamp(0.0, 1.0);
selectivity += fraction * (bucket.row_count as f64 / self.total_rows as f64);
}
// else: value >= bucket.upper_bound, no match
}
selectivity
}
/// Estimate selectivity for range predicate (value < X)
pub fn estimate_lt(&self, value: &Value) -> f64 {
1.0 - self.estimate_gte(value)
}
/// Estimate selectivity for range predicate (value >= X)
pub fn estimate_gte(&self, value: &Value) -> f64 {
let mut selectivity = 0.0;
for bucket in &self.buckets {
if value <= &bucket.lower_bound {
// Entire bucket matches
selectivity += bucket.row_count as f64 / self.total_rows as f64;
} else if value > &bucket.lower_bound && value <= &bucket.upper_bound {
// Partial bucket
let bucket_range = bucket.upper_bound.subtract(&bucket.lower_bound).to_f64();
let matching_range = bucket.upper_bound.subtract(value).to_f64();
let fraction = (matching_range / bucket_range).clamp(0.0, 1.0);
selectivity += fraction * (bucket.row_count as f64 / self.total_rows as f64);
}
}
selectivity
}
}
Pros: Better for skewed data, each bucket has equal weight.
Cons: Bucket boundaries can be arbitrary for data with many duplicates.
Compressed Histograms (Handling Duplicates)
Separate handling for most common values (MCV):
Data: [1, 1, 1, 1, 1, 2, 3, 4, 5, 10, 11, 12, 13, 14]
Compressed Histogram:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Most Common Values (MCV): โ
โ Value 1: frequency = 5/14 = 35.7% โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Histogram (for remaining values): โ
โ Bucket 1: [2-5] โ 4 rows โ
โ Bucket 2: [6-10] โ 1 row โ
โ Bucket 3: [11-14] โ 4 rows โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
#[derive(Debug, Clone)]
pub struct CompressedHistogram {
pub mcv_list: MCVList,
pub histogram: EquiDepthHistogram,
pub null_fraction: f64,
}
#[derive(Debug, Clone)]
pub struct MCVList {
pub values: Vec<(Value, f64)>, // (value, frequency)
pub total_mcv_frequency: f64,
}
impl CompressedHistogram {
pub fn build(values: &[Value], num_buckets: usize, num_mcv: usize) -> Result<Self, HistogramError> {
// Handle NULLs
let null_count = values.iter().filter(|v| v.is_null()).count();
let null_fraction = null_count as f64 / values.len() as f64;
let non_null_values: Vec<_> = values.iter().filter(|v| !v.is_null()).collect();
// Build MCV list
let mcv_list = Self::build_mcv_list(&non_null_values, num_mcv);
// Build histogram for non-MCV values
let mcv_values: std::collections::HashSet<_> =
mcv_list.values.iter().map(|(v, _)| v).collect();
let non_mcv_values: Vec<_> = non_null_values
.iter()
.filter(|v| !mcv_values.contains(v))
.cloned()
.collect();
let histogram = if non_mcv_values.is_empty() {
// All values are in MCV, create empty histogram
EquiDepthHistogram {
num_buckets: 0,
total_rows: 0,
buckets: Vec::new(),
}
} else {
EquiDepthHistogram::build(&non_mcv_values, num_buckets)?
};
Ok(Self {
mcv_list,
histogram,
null_fraction,
})
}
fn build_mcv_list(values: &[Value], num_mcv: usize) -> MCVList {
let mut value_counts: std::collections::HashMap<&Value, usize> =
std::collections::HashMap::new();
for value in values {
*value_counts.entry(value).or_insert(0) += 1;
}
let mut mcv: Vec<_> = value_counts.into_iter().collect();
mcv.sort_by(|a, b| b.1.cmp(&a.1)); // Sort by count descending
let total_mcv_frequency = mcv.iter().map(|(_, c)| *c as f64).sum::<f64>() / values.len() as f64;
let mcv_values = mcv
.into_iter()
.take(num_mcv)
.map(|(v, c)| (v.clone(), c as f64 / values.len() as f64))
.collect();
MCVList {
values: mcv_values,
total_mcv_frequency,
}
}
/// Estimate selectivity with MCV awareness
pub fn estimate_selectivity(&self, op: &BinaryOperator, value: &Value) -> f64 {
// Check MCV first
for (mcv_val, frequency) in &self.mcv_list.values {
if mcv_val == value {
return match op {
BinaryOperator::Eq => *frequency,
BinaryOperator::Neq => 1.0 - frequency,
_ => {
// For range ops, MCV exact match doesn't help much
// Fall through to histogram estimation
0.0
}
};
}
}
// Not in MCV, use histogram
let histogram_selectivity = match op {
BinaryOperator::Eq => self.histogram.estimate_eq(value),
BinaryOperator::Gt => self.histogram.estimate_gt(value),
BinaryOperator::Lt => self.histogram.estimate_lt(value),
BinaryOperator::Gte => self.histogram.estimate_gte(value),
BinaryOperator::Lte => 1.0 - self.histogram.estimate_gt(value),
_ => 0.5,
};
// Adjust for non-MCV portion
let non_mcv_fraction = 1.0 - self.mcv_list.total_mcv_frequency;
histogram_selectivity * non_mcv_fraction
}
}
Pros: Best of both worlds - accurate for common values, good distribution for rest.
Cons: More complex, requires more memory.
3 Sampling for Large Tables
The Problem: Full Table Scans Are Expensive
-- Table: users (100 million rows)
-- Building histogram requires sorting all values
-- Full scan approach:
SELECT balance FROM users ORDER BY balance;
-- Time: 30 minutes (!)
-- I/O: Read entire table
-- Not practical for routine ANALYZE
Solution: Statistical Sampling
// src/optimizer/sampling.rs
use rand::seq::SliceRandom;
use rand::thread_rng;
pub struct Sampler {
sample_size: usize,
confidence: f64,
}
impl Sampler {
pub fn new(sample_size: usize, confidence: f64) -> Self {
Self { sample_size, confidence }
}
/// Reservoir sampling for streaming data
pub fn reservoir_sample<T: Clone>(
&self,
stream: impl Iterator<Item = T>,
) -> Vec<T> {
let mut reservoir = Vec::with_capacity(self.sample_size);
let mut count = 0;
for item in stream {
count += 1;
if reservoir.len() < self.sample_size {
reservoir.push(item);
} else {
// Replace with probability sample_size / count
let j = thread_rng().gen_range(0..count);
if j < self.sample_size {
reservoir[j] = item;
}
}
}
reservoir
}
/// Simple random sampling for in-memory data
pub fn simple_random_sample<T: Clone>(&self, values: &[T]) -> Vec<T> {
let mut rng = thread_rng();
let mut sample: Vec<T> = values.to_vec();
sample.shuffle(&mut rng);
sample.truncate(self.sample_size);
sample
}
/// Calculate confidence interval for estimate
pub fn confidence_interval(&self, sample_proportion: f64, sample_size: usize) -> (f64, f64) {
// z-score for confidence level (1.96 for 95%)
let z = match self.confidence {
0.99 => 2.576,
0.95 => 1.96,
0.90 => 1.645,
_ => 1.96,
};
let standard_error = (sample_proportion * (1.0 - sample_proportion) / sample_size as f64).sqrt();
let margin = z * standard_error;
(sample_proportion - margin, sample_proportion + margin)
}
}
// Usage in histogram building
impl EquiDepthHistogram {
pub fn build_from_sample(
values: &[Value],
num_buckets: usize,
sample_size: usize,
) -> Result<Self, HistogramError> {
let sampler = Sampler::new(sample_size, 0.95);
let sample = sampler.simple_random_sample(values);
// Build histogram from sample
let mut histogram = Self::build(&sample, num_buckets)?;
// Scale row counts to match full table
let scale_factor = values.len() as f64 / sample.len() as f64;
for bucket in &mut histogram.buckets {
bucket.row_count = (bucket.row_count as f64 * scale_factor).round() as u64;
bucket.cumulative_count = (bucket.cumulative_count as f64 * scale_factor).round() as u64;
}
histogram.total_rows = values.len() as u64;
Ok(histogram)
}
}
Sample Size Guidelines
| Table Size | Recommended Sample | Accuracy (95% CI) |
|---|---|---|
| < 10K rows | 100% (full scan) | Exact |
| 10K-1M rows | 10% | ยฑ1% |
| 1M-100M rows | 1% | ยฑ0.1% |
| > 100M rows | 0.1% or 100K rows | ยฑ0.03% |
4 Multi-Column Statistics
The Problem: Column Correlation
-- Table: users (1,000,000 rows)
-- Columns: country, city
-- Individual column statistics:
-- country: 'US' = 50%, 'UK' = 30%, 'DE' = 20%
-- city: 'London' = 5%, 'Berlin' = 10%, 'Munich' = 5%
-- Query:
SELECT * FROM users WHERE country = 'UK' AND city = 'London'
-- Optimizer (assuming independence):
-- Selectivity = 0.30 ร 0.05 = 0.015 = 1.5%
-- Estimated rows: 15,000
-- Reality (London only exists in UK):
-- Actual selectivity = 5% (all London users are in UK)
-- Actual rows: 50,000
-- Error: 3.3x underestimate!
Multi-Column Histograms
#[derive(Debug, Clone)]
pub struct MultiColumnHistogram {
pub columns: Vec<String>,
pub num_buckets: usize,
pub total_rows: u64,
pub buckets: Vec<MultiColumnBucket>,
}
#[derive(Debug, Clone)]
pub struct MultiColumnBucket {
pub bounds: Vec<(Value, Value)>, // (min, max) for each column
pub row_count: u64,
pub distinct_count: u64,
}
impl MultiColumnHistogram {
pub fn build(
data: &[Vec<Value>], // Each row is [col1_value, col2_value, ...]
columns: Vec<String>,
num_buckets: usize,
) -> Result<Self, HistogramError> {
if data.is_empty() {
return Err(HistogramError::EmptyData);
}
// Use k-d tree style partitioning for multi-dimensional buckets
let buckets = Self::build_kd_histogram(data, num_buckets)?;
Ok(Self {
columns,
num_buckets,
total_rows: data.len() as u64,
buckets,
})
}
fn build_kd_histogram(
data: &[Vec<Value>],
num_buckets: usize,
) -> Result<Vec<MultiColumnBucket>, HistogramError> {
// Simplified: split on dimension with highest variance
let num_dims = data[0].len();
let target_bucket_size = data.len() / num_buckets;
let mut buckets = Vec::new();
Self::partition_data(data, 0, num_buckets, &mut buckets, target_bucket_size)?;
Ok(buckets)
}
fn partition_data(
data: &[Vec<Value>],
depth: usize,
remaining_buckets: usize,
buckets: &mut Vec<MultiColumnBucket>,
target_size: usize,
) -> Result<(), HistogramError> {
if remaining_buckets == 1 || data.len() <= target_size {
// Create leaf bucket
let bucket = Self::create_bucket(data)?;
buckets.push(bucket);
return Ok(());
}
// Split on dimension with highest range
let dim = depth % data[0].len();
let mut sorted = data.to_vec();
sorted.sort_by(|a, b| a[dim].cmp(&b[dim]));
let mid = sorted.len() / 2;
let (left, right) = sorted.split_at(mid);
Self::partition_data(left, depth + 1, remaining_buckets / 2, buckets, target_size)?;
Self::partition_data(right, depth + 1, remaining_buckets - remaining_buckets / 2, buckets, target_size)?;
Ok(())
}
fn create_bucket(data: &[Vec<Value>]) -> Result<MultiColumnBucket, HistogramError> {
let num_dims = data[0].len();
let mut bounds = Vec::new();
for dim in 0..num_dims {
let mut values: Vec<_> = data.iter().map(|row| &row[dim]).collect();
values.sort();
bounds.push((values.first().unwrap().clone(), values.last().unwrap().clone()));
}
let distinct_count = data.iter().collect::<std::collections::HashSet<_>>().len() as u64;
Ok(MultiColumnBucket {
bounds,
row_count: data.len() as u64,
distinct_count,
})
}
pub fn estimate_selectivity(&self, predicates: &[(usize, BinaryOperator, Value)]) -> f64 {
let mut selectivity = 0.0;
for bucket in &self.buckets {
let bucket_selectivity = self.estimate_bucket_selectivity(bucket, predicates);
selectivity += bucket_selectivity * (bucket.row_count as f64 / self.total_rows as f64);
}
selectivity
}
fn estimate_bucket_selectivity(
&self,
bucket: &MultiColumnBucket,
predicates: &[(usize, BinaryOperator, Value)],
) -> f64 {
let mut bucket_selectivity = 1.0;
for (col_idx, op, value) in predicates {
if *col_idx >= bucket.bounds.len() {
continue;
}
let (min, max) = &bucket.bounds[*col_idx];
let col_selectivity = match op {
BinaryOperator::Eq => {
if value >= min && value <= max {
1.0 / bucket.distinct_count.max(1) as f64
} else {
0.0
}
}
BinaryOperator::Gt => {
if value < min {
1.0
} else if value >= max {
0.0
} else {
let range = max.subtract(min).to_f64();
let matching = max.subtract(value).to_f64();
(matching / range).clamp(0.0, 1.0)
}
}
// ... handle other operators
_ => 0.5,
};
bucket_selectivity *= col_selectivity;
}
bucket_selectivity
}
}
When to use multi-column stats:
| Scenario | Recommendation |
|---|---|
| Columns are independent | Single-column histograms |
| Strong correlation (city โ country) | Multi-column histogram |
| Frequently used together in WHERE | Multi-column histogram |
| High cardinality combination | May not be worth it |
5 Using Histograms in the Optimizer
Integrating with Cost Model
// src/optimizer/histogram_selectivity.rs
pub struct HistogramSelectivityEstimator {
statistics: Arc<StatisticsCatalog>,
}
impl HistogramSelectivityEstimator {
pub fn new(statistics: Arc<StatisticsCatalog>) -> Self {
Self { statistics }
}
pub fn estimate(&self, table: &str, column: &str, expr: &Expression) -> f64 {
let table_stats = match self.statistics.get_table(table) {
Some(stats) => stats,
None => return 0.5, // No stats, use default
};
let col_stats = match table_stats.columns.get(column) {
Some(stats) => stats,
None => return 0.5,
};
match &col_stats.histogram {
Some(Histogram::Compressed(comp)) => {
match expr {
Expression::BinaryOp { op, right, .. } => {
comp.estimate_selectivity(op, right)
}
Expression::InList { list, .. } => {
// Sum selectivity for each value
list.iter()
.map(|v| comp.estimate_selectivity(&BinaryOperator::Eq, v))
.sum()
}
Expression::Between { low, high, .. } => {
let low_sel = comp.estimate_selectivity(&BinaryOperator::Gte, low);
let high_sel = comp.estimate_selectivity(&BinaryOperator::Lte, high);
(low_sel + high_sel - 1.0).max(0.0)
}
_ => 0.5,
}
}
Some(Histogram::EquiDepth(equi)) => {
match expr {
Expression::BinaryOp { op, right, .. } => {
match op {
BinaryOperator::Eq => equi.estimate_eq(right),
BinaryOperator::Gt => equi.estimate_gt(right),
BinaryOperator::Lt => equi.estimate_lt(right),
BinaryOperator::Gte => equi.estimate_gte(right),
BinaryOperator::Lte => 1.0 - equi.estimate_gt(right),
_ => 0.5,
}
}
_ => 0.5,
}
}
None => {
// No histogram, fall back to MCV or default
self.estimate_from_mcv(col_stats, expr)
}
}
}
fn estimate_from_mcv(&self, col_stats: &ColumnStatistics, expr: &Expression) -> f64 {
match expr {
Expression::BinaryOp { op: BinaryOperator::Eq, right, .. } => {
// Check if value is in MCV list
for (mcv_val, frequency) in &col_stats.most_common_values {
if mcv_val == right {
return *frequency;
}
}
// Not in MCV, estimate based on distinct count
1.0 / col_stats.distinct_count.max(1) as f64
}
_ => 0.5, // Default for other operators
}
}
}
// Updated cost model using histograms
impl CostModel {
pub fn estimate_selectivity_with_histograms(
&self,
estimator: &HistogramSelectivityEstimator,
table: &str,
expr: &Expression,
) -> f64 {
match expr {
Expression::BinaryOp { op, left, right, .. } => {
// Find which side is a column reference
if let Expression::Identifier(ident) = left.as_ref() {
return estimator.estimate(table, &ident.value, &Expression::BinaryOp {
op: op.clone(),
left: Box::new(Expression::Identifier(ident.clone())),
right: right.clone(),
});
}
// Default estimate
0.5
}
Expression::InList { expr, list, negated } => {
let base_selectivity = list.iter()
.map(|v| estimator.estimate(table, "column", v))
.sum::<f64>()
.min(1.0);
if *negated { 1.0 - base_selectivity } else { base_selectivity }
}
_ => 0.5,
}
}
}
Example: Real Query Optimization
-- Table statistics (from ANALYZE):
-- users: 1,000,000 rows
-- users.balance: histogram shows 80% have balance < $100
-- Query:
SELECT * FROM users WHERE balance > 100
-- Without histogram:
-- Selectivity estimate: 0.33 (fixed guess)
-- Estimated rows: 333,333
-- Chosen plan: SeqScan (cost: 5000)
-- With histogram:
-- Selectivity estimate: 0.20 (from histogram)
-- Estimated rows: 200,000
-- Chosen plan: IndexScan (cost: 2000) โ Better!
6 Histogram Maintenance
When to Update Statistics
// src/optimizer/stats_maintenance.rs
pub struct StatsMaintenancePolicy {
auto_analyze_threshold: u64, // Rows modified before auto-analyze
auto_analyze_scale_factor: f64, // Fraction of table
last_analyze_threshold: chrono::Duration, // Max time since last analyze
}
impl Default for StatsMaintenancePolicy {
fn default() -> Self {
Self {
auto_analyze_threshold: 50, // PostgreSQL default
auto_analyze_scale_factor: 0.2, // 20% of table
last_analyze_threshold: chrono::Duration::days(7),
}
}
}
pub struct StatsMaintenanceManager {
catalog: Arc<Catalog>,
policy: StatsMaintenancePolicy,
modification_counts: HashMap<String, u64>, // table โ rows modified
}
impl StatsMaintenanceManager {
pub fn check_and_analyze(&mut self, table: &str) -> Result<bool, AnalyzerError> {
let table_stats = self.catalog.get_table_statistics(table);
let modified_rows = self.modification_counts.get(table).copied().unwrap_or(0);
let needs_analyze = if let Some(stats) = table_stats {
let threshold = self.policy.auto_analyze_threshold
+ (stats.row_count as f64 * self.policy.auto_analyze_scale_factor) as u64;
modified_rows >= threshold
} else {
true // No stats, need initial analyze
};
if needs_analyze {
self.analyze_table(table)?;
self.modification_counts.insert(table.to_string(), 0);
Ok(true)
} else {
Ok(false)
}
}
pub fn record_modification(&mut self, table: &str, rows_affected: u64) {
let count = self.modification_counts.entry(table.to_string()).or_insert(0);
*count += rows_affected;
}
fn analyze_table(&self, table: &str) -> Result<(), AnalyzerError> {
// Run ANALYZE on the table
let analyzer = StatisticsAnalyzer::new(self.catalog.clone());
let stats = analyzer.analyze_table(table)?;
self.catalog.store_statistics(&stats)?;
Ok(())
}
}
Incremental Histogram Updates
Problem: Full rebuild is expensive for large tables.
Solution: Incremental updates for small changes.
impl EquiDepthHistogram {
/// Update histogram with new values (for small changes)
pub fn update(&mut self, new_values: &[Value], deleted_values: &[Value]) {
// Update total row count
self.total_rows += new_values.len() as u64;
self.total_rows -= deleted_values.len() as u64;
// For each new value, find its bucket and increment count
for value in new_values {
for bucket in &mut self.buckets {
if value >= &bucket.lower_bound && value <= &bucket.upper_bound {
bucket.row_count += 1;
bucket.cumulative_count += 1;
break;
}
}
}
// For deleted values, decrement counts
for value in deleted_values {
for bucket in &mut self.buckets {
if value >= &bucket.lower_bound && value <= &bucket.upper_bound {
bucket.row_count = bucket.row_count.saturating_sub(1);
bucket.cumulative_count = bucket.cumulative_count.saturating_sub(1);
break;
}
}
}
// If too many changes, trigger full rebuild
let total_changes = new_values.len() + deleted_values.len();
if total_changes as f64 / self.total_rows as f64 > 0.1 {
// Mark for full rebuild
self.needs_rebuild = true;
}
}
}
7 Challenges Building in Rust
Challenge 1: Value Comparison
Problem: SQL values can be different types (int, float, string, date).
// โ Doesn't work - can't compare different types
pub enum Value {
Integer(i64),
Float(f64),
String(String),
Date(chrono::NaiveDate),
}
impl Ord for Value {
fn cmp(&self, other: &Self) -> Ordering {
// Can't compare Integer with String!
}
}
Solution: Type-aware comparison
// โ
Works - handle type mismatches
impl Value {
pub fn compare(&self, other: &Self) -> Result<Ordering, ValueError> {
match (self, other) {
(Value::Integer(a), Value::Integer(b)) => Ok(a.cmp(b)),
(Value::Float(a), Value::Float(b)) => a.partial_cmp(b).ok_or(ValueError::NaN),
(Value::String(a), Value::String(b)) => Ok(a.cmp(b)),
(Value::Integer(a), Value::Float(b)) => (*a as f64).partial_cmp(b).ok_or(ValueError::NaN),
(Value::Float(a), Value::Integer(b)) => a.partial_cmp(&(*b as f64)).ok_or(ValueError::NaN),
_ => Err(ValueError::TypeMismatch),
}
}
}
Challenge 2: Memory Usage
Problem: Histograms for many columns can use significant memory.
// โ Memory explosion
// 1000 tables ร 20 columns ร 100 buckets ร 32 bytes = 64 MB
Solution: Compress bucket bounds
// โ
Compressed representation
pub struct CompressedBucket {
pub lower_bound_idx: u32, // Index into shared value pool
pub upper_bound_idx: u32,
pub row_count: u32,
}
pub struct CompressedHistogram {
shared_values: Vec<Value>, // Deduplicated values
buckets: Vec<CompressedBucket>,
}
Challenge 3: Thread-Safe Statistics
Problem: Statistics need to be read during query planning while being updated by ANALYZE.
// โ Race condition
pub struct StatisticsCatalog {
tables: HashMap<String, TableStatistics>,
}
Solution: Arc<RwLock<>> for concurrent access
// โ
Thread-safe
pub struct StatisticsCatalog {
tables: Arc<RwLock<HashMap<String, Arc<TableStatistics>>>>,
}
impl StatisticsCatalog {
pub fn get_table(&self, name: &str) -> Option<Arc<TableStatistics>> {
let tables = self.tables.read().unwrap();
tables.get(name).cloned()
}
pub fn update_table(&self, stats: TableStatistics) {
let mut tables = self.tables.write().unwrap();
tables.insert(stats.table_name.clone(), Arc::new(stats));
}
}
8 How AI Accelerated This
What AI Got Right
| Task | AI Contribution |
|---|---|
| Histogram types | Explained equi-width vs. equi-depth trade-offs |
| MCV handling | Suggested separate MCV list for common values |
| Sampling | Reservoir sampling algorithm for streaming |
| Selectivity formulas | Correct range estimation within buckets |
What AI Got Wrong
| Issue | What Happened |
|---|---|
| Bucket boundaries | First draft had off-by-one errors in boundary calculation |
| Multi-column correlation | Initially assumed independence (wrong for correlated columns) |
| NULL handling | Forgot to track NULL fraction separately |
| Incremental updates | Suggested full rebuild for any change (too expensive) |
Pattern: AI handles textbook cases well. Edge cases (boundaries, NULLs, incremental) need manual refinement.
Example: Debugging Selectivity
My question to AI:
โHistogram shows 20% of rows have balance > 100, but optimizer estimates 5%. Why?โ
What I learned:
- MCV list wasnโt being checked first
- Non-MCV fraction adjustment was missing
- Range estimation within bucket was inverted
Result: Fixed selectivity estimation:
pub fn estimate_selectivity(&self, op: &BinaryOperator, value: &Value) -> f64 {
// Check MCV first
for (mcv_val, frequency) in &self.mcv_list.values {
if mcv_val == value {
return match op {
BinaryOperator::Eq => *frequency,
BinaryOperator::Neq => 1.0 - frequency,
_ => 0.0 // Fall through for range ops
};
}
}
// Use histogram for non-MCV values
let histogram_sel = self.histogram.estimate_gt(value);
let non_mcv_fraction = 1.0 - self.mcv_list.total_mcv_frequency;
histogram_sel * non_mcv_fraction // โ Was missing this!
}
Summary: Histogram Statistics in One Diagram
Key Takeaways:
| Concept | Why It Matters |
|---|---|
| Equi-depth histograms | Better for skewed data than equi-width |
| MCV lists | Accurate for common values |
| Sampling | Makes ANALYZE practical for large tables |
| Multi-column stats | Captures column correlation |
| Incremental updates | Avoids full rebuild for small changes |
| Thread-safe access | Allows concurrent reads during updates |
Further Reading:
- PostgreSQL Source:
src/backend/commands/analyze.c - โRandom Sampling for Histogram Constructionโ by Vitter et al.
- โSelectivity Estimation for Range Queriesโ by Ioannidis et al.
- PostgreSQL Documentation: โStatistics Used by the Plannerโ
- Vaultgres Repository: github.com/neoalienson/Vaultgres