Slowly Changing Dimensions: Managing Historical Data in Data Warehouses

  1. The Historical Data Problem
  2. SCD Type Classification
  3. Type 0: Retain Original
  4. Type 1: Overwrite
  5. Type 2: Add New Row
  6. Type 3: Add New Attribute
  7. Type 4: Add History Table
  8. Type 6: Combined Approach
  9. Type 7: Dual Key Strategy
  10. Choosing the Right Type
  11. Conclusion

Data warehouses promise historical analysis, but dimensional data refuses to stay still. A customer relocates. A product changes categories. A salesperson transfers to a new region. These changes seem simple until you ask: should historical reports reflect the old state or the new one?

Traditional databases handle changes through updates—overwrite the old value with the new. This works for transactional systems focused on current state, but destroys the historical context that makes data warehouses valuable. When a salesperson moves from California to Illinois, should their past sales appear under California or Illinois in regional reports?

Slowly Changing Dimensions (SCDs) address this challenge. Unlike rapidly changing transactional data—customer IDs, quantities, prices—dimensional attributes like locations, names, and categories change infrequently but unpredictably. The “slowly” in SCD doesn’t mean changes are rare; it means they’re not part of normal transaction flow.

Ralph Kimball’s Data Warehouse Toolkit popularized a classification system for handling these changes: Types 0 through 7. Each type represents a different trade-off between historical accuracy, data complexity, and query performance. Type 1 sacrifices history for simplicity. Type 2 preserves complete history at the cost of complexity. Type 6 combines multiple approaches for maximum flexibility.

The challenge extends beyond technical implementation. SCDs must maintain referential integrity between fact and dimension tables. A sales fact table links to a supplier dimension. When the supplier relocates, historical sales must still connect to the correct supplier information without breaking database relationships.

This article explores each SCD type, examines their trade-offs, and provides guidance on choosing the right approach for different scenarios.

The Historical Data Problem

Before exploring solutions, understanding the problem reveals why dimensional changes create complexity.

The Supplier Relocation Scenario

Consider a data warehouse tracking supplier relationships:

📊 Initial State

Supplier Dimension

  • Supplier: Acme Supply Co
  • Location: California
  • Supplier Code: ABC

Fact Table

  • 2005 Sales: $500,000
  • 2006 Sales: $750,000

Business Question "What were our California supplier sales in 2005?"

The answer seems obvious: $500,000. But when Acme relocates to Illinois in 2007, the question becomes ambiguous. Should the report show:

  1. Current State: Zero California sales (Acme is now in Illinois)
  2. Historical State: $500,000 (Acme was in California in 2005)
  3. Both: Depends on the query

Different business needs require different answers. Tax reporting needs historical accuracy—sales occurred in California regardless of current location. Strategic planning might want current state—understanding current supplier distribution matters more than past locations.

The Referential Integrity Challenge

Database relationships complicate matters further:

🚫 The Update Problem

Scenario

  • Fact table references Supplier_Key = 123
  • Supplier 123 is "Acme Supply Co, CA"
  • Supplier relocates to Illinois

Option 1: Update in Place

  • Change Supplier 123 to "Acme Supply Co, IL"
  • Historical facts now show Illinois
  • Lost: Historical accuracy

Option 2: Create New Record

  • Add Supplier 124 as "Acme Supply Co, IL"
  • Keep Supplier 123 as "Acme Supply Co, CA"
  • Problem: Which key for new transactions?
  • Problem: How to query "all Acme transactions"?

Neither option satisfies all requirements. Updates destroy history. New records break the natural key relationship. SCDs provide structured approaches to this dilemma.

Why “Slowly Changing” Matters

The distinction between slowly and rapidly changing data is crucial:

🔄 Change Frequency Comparison

Rapidly Changing (Transactional)

  • Customer ID in each order
  • Product ID in each sale
  • Quantity, price, date
  • Changes: Millions per day
  • Strategy: Store each transaction

Slowly Changing (Dimensional)

  • Customer address
  • Product category
  • Supplier location
  • Changes: Dozens per month
  • Strategy: Depends on business needs

Transactional data changes are expected and designed for. Dimensional changes are exceptions that require special handling. The “slowly” qualifier indicates these changes happen outside normal transaction flow, making them harder to anticipate and manage.

SCD Type Classification

Ralph Kimball’s classification system provides a structured approach to handling dimensional changes.

The Type System Overview

📋 SCD Types at a Glance

Type 0: Retain Original

  • Never changes
  • Original values preserved
  • Example: Date of birth

Type 1: Overwrite

  • Replace old with new
  • No history maintained
  • Simplest approach

Type 2: Add New Row

  • Create new record for each change
  • Complete history preserved
  • Most common approach

Type 3: Add New Attribute

  • Store limited history in columns
  • Tracks previous value only
  • Simple queries

Type 4: Add History Table

  • Separate table for history
  • Current and historical data split
  • Audit-style tracking

Type 6: Combined (1+2+3)

  • Hybrid approach
  • Multiple tracking methods
  • Maximum flexibility

Type 7: Dual Key

  • Both surrogate and natural keys
  • Flexible query options
  • Complex implementation

Each type represents different priorities. Type 1 prioritizes simplicity over history. Type 2 prioritizes history over simplicity. Type 6 attempts to provide both at the cost of complexity.

Choosing Factors

The right type depends on specific requirements:

🎯 Decision Factors

Historical Accuracy

  • How important is past state?
  • Regulatory requirements?
  • Audit trail needed?

Query Patterns

  • Current state queries?
  • Historical state queries?
  • Both types needed?

Data Volume

  • How often do changes occur?
  • Storage constraints?
  • Performance requirements?

Complexity Tolerance

  • Team expertise?
  • Maintenance burden?
  • ETL complexity acceptable?

No single type fits all scenarios. Many data warehouses use different types for different dimensions, or even different types for different attributes within the same dimension.

Type 0: Retain Original

Some attributes never change by definition.

Immutable Attributes

Type 0 applies to attributes with durable values:

✅ Type 0 Characteristics

Definition

  • Attribute never changes
  • Original value preserved forever
  • No special handling needed

Common Examples

  • Date of birth
  • Original credit score
  • Account creation date
  • Initial classification

Implementation

  • Standard column
  • No update logic
  • Simplest possible approach

Type 0 isn’t really a “slowly changing” dimension—it’s a “never changing” dimension. The classification exists to explicitly identify attributes that should never be updated, preventing accidental modifications.

Date Dimensions

Date dimensions typically use Type 0 for most attributes:

CREATE TABLE Date_Dimension (
    Date_Key INT PRIMARY KEY,
    Full_Date DATE,
    Day_Of_Week VARCHAR(10),
    Month_Name VARCHAR(10),
    Quarter INT,
    Year INT,
    Is_Holiday BOOLEAN
);

📅 Date Dimension Stability

Why Type 0 Works

  • January 1, 2006 is always January 1, 2006
  • Day of week never changes for a given date
  • Quarter and year are fixed

Exception: Is_Holiday

  • Holiday designations can change
  • New holidays declared
  • Might need Type 1 or Type 2

Date dimensions demonstrate Type 0’s simplicity. Once populated, they rarely need updates. This stability makes them ideal for fact table joins and time-based analysis.

Business Value

Type 0 provides clarity and protection:

✅ Type 0 Benefits

Clarity

  • Explicitly marks immutable data
  • Documents business rules
  • Prevents confusion

Protection

  • No accidental updates
  • Data integrity guaranteed
  • Audit compliance

Performance

  • No version checking
  • Simple queries
  • Minimal storage

Identifying Type 0 attributes during design prevents future problems. When a business rule states “this never changes,” Type 0 enforces that rule at the data model level.

Type 1: Overwrite

The simplest approach: replace old values with new ones.

How Type 1 Works

Type 1 updates records in place:

Initial State:

Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co CA

After Relocation:

Supplier_Key Supplier_Code Supplier_Name Supplier_State
123 ABC Acme Supply Co IL
UPDATE Supplier
SET Supplier_State = 'IL'
WHERE Supplier_Code = 'ABC';

The old value disappears. The new value takes its place. No history, no complexity.

When Type 1 Makes Sense

Type 1 works for specific scenarios:

✅ Good Type 1 Use Cases

Error Corrections

  • Typos in data entry
  • Incorrect initial values
  • Data quality fixes
  • No history needed for mistakes

Irrelevant Changes

  • Changes that don't affect analysis
  • Cosmetic updates
  • Standardization changes
  • Example: "St." to "Street"

Current State Only

  • No historical reporting needed
  • Current snapshot sufficient
  • Storage constraints
  • Simple reporting requirements

A supplier’s phone number might use Type 1. Historical phone numbers rarely matter for analysis. When the number changes, overwrite it. Reports always show current contact information.

The Cost of Simplicity

Type 1’s simplicity comes with significant limitations:

🚫 Type 1 Limitations

Lost History

  • Cannot answer "what was the value in 2005?"
  • Audit trail destroyed
  • Compliance issues
  • Irreversible data loss

Aggregate Impact

  • Pre-calculated aggregates become incorrect
  • Must recalculate summaries
  • Example: "Sales by State" changes retroactively
  • Performance implications

Reporting Confusion

  • Historical reports show current values
  • "Sales in California" shows zero after relocation
  • Business users confused
  • Trust in data erodes

Aggregate Recalculation Problem

Type 1 creates cascading updates:

⚠️ Aggregate Maintenance

Scenario

  • Aggregate table: Sales by Supplier State
  • California total: $5,000,000
  • Illinois total: $2,000,000

After Type 1 Update

  • Supplier moves from CA to IL
  • Aggregate table now incorrect
  • Must recalculate:
    • Subtract from California
    • Add to Illinois

Problem

  • Expensive recalculation
  • Multiple aggregate tables affected
  • ETL complexity increases
  • Defeats purpose of aggregates

Pre-calculated aggregates exist for performance. Type 1 changes force recalculation, negating the performance benefit. This makes Type 1 unsuitable for dimensions used in aggregate tables.

Implementation Considerations

Type 1 requires careful planning:

💡 Type 1 Best Practices

Document Decisions

  • Explicitly mark Type 1 attributes
  • Explain why history isn't needed
  • Get business sign-off

Audit Trail Alternative

  • Consider database audit logs
  • Separate audit table
  • Compromise: simple updates, external history

Hybrid Approach

  • Type 1 for some attributes
  • Type 2 for others
  • Same dimension, different strategies

Type 1 works best when combined with other types. Use Type 1 for attributes where history truly doesn’t matter, and Type 2 for attributes where it does.

Type 2: Add New Row

The most common SCD approach: create a new record for each change.

How Type 2 Works

Type 2 preserves complete history through multiple records:

Initial State:

Supplier_Key Supplier_Code Supplier_Name Supplier_State Version
123 ABC Acme Supply Co CA 0

After First Relocation:

Supplier_Key Supplier_Code Supplier_Name Supplier_State Version
123 ABC Acme Supply Co CA 0
124 ABC Acme Supply Co IL 1

Each change creates a new row with a new surrogate key. The natural key (Supplier_Code) remains the same, but the surrogate key (Supplier_Key) changes.

Implementation Variations

Type 2 has several implementation patterns:

🔧 Type 2 Implementation Options

Version Numbers

  • Sequential version column
  • Simple to understand
  • Easy to find latest

Effective Dates

  • Start_Date and End_Date columns
  • Precise temporal tracking
  • Supports point-in-time queries

Current Flag

  • Boolean indicating current record
  • Fast current-state queries
  • Often combined with dates

Effective Date Implementation:

Supplier_Key Supplier_Code Supplier_Name Supplier_State Start_Date End_Date
123 ABC Acme Supply Co CA 2000-01-01 2004-12-22
124 ABC Acme Supply Co IL 2004-12-22 NULL

The NULL End_Date indicates the current record. Some implementations use a high date value (9999-12-31) instead of NULL to avoid null handling in queries.

Current Flag Implementation:

Supplier_Key Supplier_Code Supplier_Name Supplier_State Effective_Date Current_Flag
123 ABC Acme Supply Co CA 2000-01-01 N
124 ABC Acme Supply Co IL 2004-12-22 Y

The Current_Flag enables fast filtering: WHERE Current_Flag = 'Y' returns only current records.

Fact Table Integration

Type 2’s power comes from fact table integration:

✅ Type 2 Benefits

Historical Accuracy

  • Facts link to correct historical dimension
  • Transaction from 2003 links to Supplier_Key 123 (CA)
  • Transaction from 2005 links to Supplier_Key 124 (IL)
  • Reports show accurate historical state

No Aggregate Updates

  • Aggregates remain correct
  • "Sales by State" doesn't change retroactively
  • Pre-calculated summaries stay valid
  • Performance maintained

Unlimited History

  • Every change tracked
  • Complete audit trail
  • Supports compliance requirements
  • Time-travel queries possible

When a fact table stores Supplier_Key, it captures the dimension state at transaction time. A sale in 2003 references Supplier_Key 123 (California). That relationship never changes, even when the supplier relocates. Historical reports automatically show the correct state.

Query Patterns

Type 2 supports multiple query patterns:

Current State Query:

SELECT 
    s.Supplier_Name,
    s.Supplier_State,
    SUM(f.Sales_Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Supplier_Key
WHERE s.Current_Flag = 'Y'
GROUP BY s.Supplier_Name, s.Supplier_State;

Historical State Query:

SELECT 
    s.Supplier_Name,
    s.Supplier_State,
    SUM(f.Sales_Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Supplier_Key
WHERE f.Sale_Date BETWEEN s.Start_Date AND s.End_Date
GROUP BY s.Supplier_Name, s.Supplier_State;

Point-in-Time Query:

SELECT 
    s.Supplier_Name,
    s.Supplier_State
FROM Supplier s
WHERE s.Supplier_Code = 'ABC'
  AND '2003-06-15' BETWEEN s.Start_Date AND COALESCE(s.End_Date, '9999-12-31');

Type 2 Challenges

Despite its popularity, Type 2 has drawbacks:

⚠️ Type 2 Challenges

Dimension Size Growth

  • Each change adds a row
  • Frequently changing dimensions grow large
  • Storage implications
  • Index maintenance overhead

ETL Complexity

  • Must detect changes
  • Close old record (set End_Date)
  • Create new record
  • Update Current_Flag
  • More complex than Type 1

Natural Key Queries

  • Multiple rows per natural key
  • Must specify time period or current flag
  • Joins become more complex
  • Risk of duplicate results

Retroactive Changes

  • Adding new attributes is difficult
  • Different effective dates for different attributes
  • May require reprocessing facts
  • Expensive operation

Retroactive Change Problem

Type 2’s biggest challenge appears when the dimension model changes:

🚫 Retroactive Change Scenario

Situation

  • Dimension tracks Supplier_State
  • Business adds requirement: track Sales_Rep
  • Sales_Rep has different effective dates than State

Problem

  • Existing rows have State changes
  • Need to add Sales_Rep changes
  • Effective dates don't align
  • Must create new time slices

Impact

  • Existing fact records point to old keys
  • Must update fact table
  • Expensive operation
  • Potential data inconsistency

This scenario makes Type 2 unsuitable for dimensions subject to frequent structural changes. When the model evolves, the cost of maintaining Type 2 can become prohibitive.

Best Practices

Type 2 works best with discipline:

💡 Type 2 Best Practices

Surrogate Keys Required

  • Never use natural keys in facts
  • Surrogate keys enable versioning
  • Simplifies key management

Consistent Date Handling

  • Standardize on NULL or high date
  • Document convention
  • Enforce in ETL

Index Strategy

  • Index Current_Flag for current queries
  • Index Start_Date and End_Date for temporal queries
  • Index natural key for lookups

ETL Change Detection

  • Compare source to current dimension
  • Detect changes efficiently
  • Batch updates for performance

Type 2 is the default choice for most slowly changing dimensions. Its combination of historical accuracy and query flexibility makes it suitable for a wide range of scenarios.

Type 3: Add New Attribute

Type 3 tracks limited history through additional columns.

How Type 3 Works

Instead of adding rows, Type 3 adds columns:

Initial State:

Supplier_Key Supplier_Code Supplier_Name Current_State
123 ABC Acme Supply Co CA

After Adding Type 3 Tracking:

Supplier_Key Supplier_Code Supplier_Name Original_State Effective_Date Current_State
123 ABC Acme Supply Co CA 2004-12-22 IL

The table structure changes to accommodate history. Original_State preserves the first value. Current_State holds the latest value.

Limited History Trade-off

Type 3 provides fixed history depth:

📊 Type 3 Characteristics

What It Tracks

  • Original value
  • Current value
  • One transition date
  • No intermediate changes

Limitations

  • Only two states visible
  • Second relocation overwrites first change
  • Cannot track multiple transitions
  • Fixed column structure

Benefits

  • Simple queries
  • No row multiplication
  • Fixed table size
  • Easy to understand

Query Simplicity

Type 3 enables straightforward queries:

-- Current state analysis
SELECT 
    Current_State,
    SUM(Sales_Amount) as Total_Sales
FROM Supplier s
JOIN Fact_Sales f ON s.Supplier_Key = f.Supplier_Key
GROUP BY Current_State;

-- Original state analysis
SELECT 
    Original_State,
    SUM(Sales_Amount) as Total_Sales
FROM Supplier s
JOIN Fact_Sales f ON s.Supplier_Key = f.Supplier_Key
GROUP BY Original_State;

-- Transition analysis
SELECT 
    Original_State,
    Current_State,
    COUNT(*) as Suppliers_Moved
FROM Supplier
WHERE Original_State != Current_State
GROUP BY Original_State, Current_State;

No date range checks. No current flags. Simple column references.

When Type 3 Makes Sense

Type 3 works for specific analytical needs:

✅ Good Type 3 Use Cases

Before/After Analysis

  • Compare original vs current
  • Migration tracking
  • Example: "Customers who moved from urban to rural"

Fixed Transitions

  • One expected change
  • Example: Product launch status (Beta → Released)
  • Example: Customer tier (Standard → Premium)

Simple Reporting

  • Business wants two views only
  • Current and original sufficient
  • No intermediate states needed

Previous Value Variation

An alternative tracks the most recent change:

Supplier_Key Supplier_Code Supplier_Name Previous_State Change_Date Current_State
123 ABC Acme Supply Co CA 2004-12-22 IL

After Second Relocation:

Supplier_Key Supplier_Code Supplier_Name Previous_State Change_Date Current_State
123 ABC Acme Supply Co IL 2008-02-04 NY

Previous_State now shows IL (not CA). This variation tracks the last transition, losing earlier history.

Type 3 Limitations

The fixed structure creates problems:

🚫 Type 3 Problems

History Loss

  • Third change overwrites second
  • Cannot reconstruct full history
  • Audit trail incomplete

Schema Changes

  • Adding more history requires ALTER TABLE
  • Application changes needed
  • Disruptive to production

Temporal Queries Impossible

  • Cannot ask "what was the value on date X?"
  • Only two points in time available
  • Limited analytical value

Fact Table Ambiguity

  • Which state applies to a given transaction?
  • Must compare transaction date to Effective_Date
  • More complex than Type 2

Hybrid Column Approach

Type 3 often combines with other types:

💡 Type 3 Hybrid Strategy

Scenario

  • Use Type 2 for complete history
  • Add Type 3 columns for common queries
  • Example: Current_State column in Type 2 dimension

Benefits

  • Type 2 provides full history
  • Type 3 columns simplify frequent queries
  • Best of both approaches

Trade-off

  • Redundant data
  • ETL must maintain both
  • Storage overhead

Type 3 rarely stands alone. Its limited history makes it unsuitable as the sole tracking mechanism. Combined with Type 2, it provides query convenience while maintaining complete history.

Type 4: Add History Table

Type 4 separates current and historical data into different tables.

How Type 4 Works

Type 4 uses two tables: one for current state, one for history:

Supplier Table (Current):

Supplier_Key Supplier_Code Supplier_Name Supplier_State
124 ABC Acme & Johnson Supply Co IL

Supplier_History Table:

Supplier_Key Supplier_Code Supplier_Name Supplier_State Create_Date
123 ABC Acme Supply Co CA 2003-06-14
124 ABC Acme & Johnson Supply Co IL 2004-12-22

The current table contains only active records. The history table contains all versions, including the current one.

Database Audit Pattern

Type 4 resembles database audit tables:

🔍 Type 4 Characteristics

Structure

  • Current table: one row per entity
  • History table: all versions
  • Both tables share key structure
  • History includes timestamps

Similarities to Audit Tables

  • Tracks all changes
  • Immutable history
  • Timestamp-based
  • Separate storage

Differences from Audit Tables

  • History table is queryable dimension
  • Fact tables can reference both
  • Part of dimensional model
  • Not just for auditing

Fact Table Integration

Type 4’s unique feature: facts can reference both tables:

CREATE TABLE Fact_Sales (
    Sale_ID INT PRIMARY KEY,
    Sale_Date DATE,
    Current_Supplier_Key INT,  -- References Supplier
    Historical_Supplier_Key INT, -- References Supplier_History
    Amount DECIMAL(10,2)
);

✅ Dual Reference Benefits

Current State Queries

  • Join to Supplier table
  • Always shows current information
  • Fast queries (smaller table)
  • No date filtering needed

Historical State Queries

  • Join to Supplier_History table
  • Shows state at transaction time
  • Complete audit trail
  • Temporal accuracy

Flexibility

  • Choose appropriate table per query
  • No query rewriting needed
  • Both views available

Query Patterns

Type 4 enables different analytical perspectives:

Current State Analysis:

SELECT 
    s.Supplier_State,
    SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Current_Supplier_Key = s.Supplier_Key
GROUP BY s.Supplier_State;

This query shows all sales grouped by current supplier state, regardless of when the sale occurred.

Historical State Analysis:

SELECT 
    sh.Supplier_State,
    SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier_History sh ON f.Historical_Supplier_Key = sh.Supplier_Key
GROUP BY sh.Supplier_State;

This query shows sales grouped by the supplier’s state at the time of sale.

ETL Considerations

Type 4 requires careful ETL design:

⚠️ Type 4 ETL Complexity

Change Detection

  • Compare source to current table
  • Detect changes
  • Update current table
  • Insert into history table

Key Management

  • Generate new surrogate key for change
  • Update current table with new key
  • Insert new row in history table
  • Update fact table with both keys

Fact Table Loading

  • Must populate both key columns
  • Current key: latest dimension key
  • Historical key: dimension key at transaction time
  • Requires temporal lookup

When Type 4 Makes Sense

Type 4 suits specific scenarios:

✅ Good Type 4 Use Cases

Dual Perspective Reporting

  • Need both current and historical views
  • Frequent switching between perspectives
  • Example: Sales by current territory vs historical territory

Performance Optimization

  • Current table stays small
  • Fast current-state queries
  • History table can be partitioned
  • Separate indexing strategies

Change Data Capture

  • Integrating with CDC systems
  • Natural fit for CDC output
  • Audit trail requirement
  • Compliance needs

Type 4 Challenges

The dual-table approach creates complexity:

🚫 Type 4 Problems

Fact Table Overhead

  • Two key columns per dimension
  • Increased storage
  • More complex ETL
  • Potential for inconsistency

Synchronization Risk

  • Current and history tables must stay aligned
  • History table should contain current record
  • ETL failures can cause divergence
  • Validation overhead

Query Confusion

  • Users must understand which table to use
  • Wrong table choice gives wrong results
  • Documentation critical
  • Training required

Limited Tool Support

  • BI tools expect single dimension table
  • May not handle dual references well
  • Custom query logic needed
  • Reporting complexity

Maintenance Strategy

Type 4 requires ongoing maintenance:

💡 Type 4 Best Practices

Consistency Checks

  • Verify history contains current record
  • Validate key relationships
  • Regular reconciliation
  • Automated validation

Clear Naming

  • Obvious table names (Current vs History)
  • Descriptive key column names
  • Documentation in data dictionary

ETL Atomicity

  • Update both tables in single transaction
  • Rollback on failure
  • Prevent partial updates

Performance Tuning

  • Partition history table by date
  • Different indexes for each table
  • Archive old history if needed

Type 4 provides flexibility at the cost of complexity. It works best when the dual perspective is genuinely needed and the team can manage the additional ETL and maintenance burden.

Type 6: Combined Approach

Type 6 combines Types 1, 2, and 3 (1 + 2 + 3 = 6) for maximum flexibility.

How Type 6 Works

Type 6 uses Type 2’s row versioning with Type 3’s current value columns and Type 1’s overwrite strategy:

Initial State:

Supplier_Key Row_Key Supplier_Code Supplier_Name Current_State Historical_State Start_Date End_Date Current_Flag
123 1 ABC Acme Supply Co CA CA 2000-01-01 9999-12-31 Y

After First Relocation:

Supplier_Key Row_Key Supplier_Code Supplier_Name Current_State Historical_State Start_Date End_Date Current_Flag
123 1 ABC Acme Supply Co IL CA 2000-01-01 2004-12-22 N
123 2 ABC Acme Supply Co IL IL 2004-12-22 9999-12-31 Y

Notice:

  • Type 2: New row created (Row_Key 2)
  • Type 3: Current_State column added
  • Type 1: Current_State in Row_Key 1 overwritten to IL

After Second Relocation:

Supplier_Key Row_Key Supplier_Code Supplier_Name Current_State Historical_State Start_Date End_Date Current_Flag
123 1 ABC Acme Supply Co NY CA 2000-01-01 2004-12-22 N
123 2 ABC Acme Supply Co NY IL 2004-12-22 2008-02-04 N
123 3 ABC Acme Supply Co NY NY 2008-02-04 9999-12-31 Y

All rows now show Current_State = NY (Type 1 overwrite), while Historical_State preserves the value that was current during each time period (Type 2 history).

The Three Techniques Combined

Type 6 applies each technique to different aspects:

🔧 Type 6 Mechanics

Type 1 Component

  • Current_State column
  • Overwritten on every change
  • All rows updated to show latest value
  • Enables "current state" queries

Type 2 Component

  • New row for each change
  • Start_Date and End_Date
  • Current_Flag
  • Preserves complete history

Type 3 Component

  • Historical_State column
  • Stores value that was current during time period
  • Never updated after row creation
  • Enables "historical state" queries

Query Flexibility

Type 6 supports three query patterns:

Current State Query:

SELECT 
    s.Current_State,
    SUM(f.Sales_Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Row_Key
GROUP BY s.Current_State;

Shows all sales grouped by current supplier state. Simple join, no date filtering.

Historical State Query:

SELECT 
    s.Historical_State,
    SUM(f.Sales_Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Row_Key
GROUP BY s.Historical_State;

Shows sales grouped by the state that was current when the sale occurred. Also simple, no date filtering needed.

Point-in-Time Query:

SELECT 
    s.Supplier_Name,
    s.Historical_State
FROM Supplier s
WHERE s.Supplier_Code = 'ABC'
  AND '2005-06-15' BETWEEN s.Start_Date AND s.End_Date;

Retrieves the dimension state as it existed on a specific date.

Kimball’s “Unpredictable Changes with Single-Version Overlay”

Ralph Kimball calls Type 6 “Unpredictable Changes with Single-Version Overlay”:

📚 Kimball's Terminology

Unpredictable Changes

  • Changes occur irregularly
  • Cannot predict when or how often
  • Typical of slowly changing dimensions

Single-Version Overlay

  • Current value "overlaid" on all historical rows
  • Creates single version of current state
  • Accessible without date logic
  • Simplifies common queries

The “overlay” refers to the Current_State column that gets updated across all rows, providing a single, consistent view of the current state regardless of which historical row you’re examining.

Implementation Complexity

Type 6 is the most complex SCD type:

⚠️ Type 6 Complexity

ETL Challenges

  • Detect changes (Type 2)
  • Create new row (Type 2)
  • Update End_Date on old row (Type 2)
  • Update Current_Flag (Type 2)
  • Update Current_State on ALL rows for entity (Type 1)
  • Most complex update logic

Storage Overhead

  • Redundant Current_State column
  • Stored in every historical row
  • Increases dimension size
  • More storage than Type 2 alone

Update Performance

  • Every change updates multiple rows
  • Must update all historical rows
  • Can be slow for entities with many versions
  • Index maintenance overhead

When Type 6 Makes Sense

Type 6 justifies its complexity in specific scenarios:

✅ Good Type 6 Use Cases

Frequent Current State Queries

  • Most queries need current state
  • Historical state needed occasionally
  • Query simplicity worth storage cost

Business User Queries

  • Non-technical users write queries
  • Date logic too complex
  • Simple column reference preferred
  • Self-service BI tools

Regulatory Requirements

  • Must maintain complete history (Type 2)
  • Must report current state (Type 1)
  • Both requirements mandatory
  • Compliance justifies complexity

Type 6 Best Practices

Managing Type 6 requires discipline:

💡 Type 6 Best Practices

Clear Column Naming

  • Current_State vs Historical_State
  • Obvious which is which
  • Prevents query errors
  • Self-documenting

Efficient Updates

  • Batch Current_State updates
  • Use set-based operations
  • Avoid row-by-row updates
  • Optimize for performance

Documentation

  • Explain the three techniques
  • Provide query examples
  • Document ETL logic
  • Train users

Consider Type 2 + View

  • Alternative: Type 2 dimension
  • Add view with current state
  • View joins to current row
  • Simpler ETL, similar query experience

The view alternative provides similar query simplicity without the update complexity:

CREATE VIEW Supplier_With_Current AS
SELECT 
    h.*,
    c.Supplier_State as Current_State
FROM Supplier h
JOIN Supplier c ON h.Supplier_Code = c.Supplier_Code
WHERE c.Current_Flag = 'Y';

This view gives each historical row access to the current state without storing it redundantly.

Type 7: Dual Key Strategy

Type 7 places both surrogate and natural keys in the fact table for maximum query flexibility.

How Type 7 Works

Type 7 uses Type 2 dimension structure but stores two keys in facts:

Supplier Dimension (Type 2):

Supplier_Key Supplier_Code Supplier_Name Supplier_State Start_Date End_Date Current_Flag
123 ABC Acme Supply Co CA 2000-01-01 2004-12-22 N
124 ABC Acme Supply Co IL 2004-12-22 2008-02-04 N
125 ABC Acme Supply Co NY 2008-02-04 9999-12-31 Y

Fact Table with Dual Keys:

CREATE TABLE Fact_Sales (
    Sale_ID INT PRIMARY KEY,
    Sale_Date DATE,
    Supplier_Key INT,        -- Surrogate key (historical)
    Supplier_Code VARCHAR(10), -- Natural key
    Amount DECIMAL(10,2)
);

The fact stores both the surrogate key (Supplier_Key) that was current at transaction time and the natural key (Supplier_Code) that identifies the entity across all versions.

Query Flexibility

Type 7 enables three query patterns without date logic:

Historical State Query:

SELECT 
    s.Supplier_State,
    SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Supplier_Key
GROUP BY s.Supplier_State;

Join on surrogate key shows historical state. A sale from 2003 joins to Supplier_Key 123 (CA), showing California sales.

Current State Query:

SELECT 
    s.Supplier_State,
    SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Code = s.Supplier_Code
WHERE s.Current_Flag = 'Y'
GROUP BY s.Supplier_State;

Join on natural key with current flag shows current state. All sales for ABC join to Supplier_Key 125 (NY), showing New York sales.

Point-in-Time Query:

SELECT 
    s.Supplier_State,
    SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Code = s.Supplier_Code
WHERE f.Sale_Date BETWEEN s.Start_Date AND s.End_Date
GROUP BY s.Supplier_State;

Join on natural key with date range shows state at any point in time.

Advantages Over Type 2

Type 7 adds flexibility to Type 2:

✅ Type 7 Benefits

Query Options

  • Historical: join on surrogate key
  • Current: join on natural key + current flag
  • Point-in-time: join on natural key + date range
  • No query rewriting needed

Referential Integrity

  • Surrogate key can have foreign key constraint
  • Natural key provides logical relationship
  • Both perspectives supported

Retroactive Changes

  • Adding attributes doesn't break facts
  • New time slices don't require fact updates
  • More resilient to dimension changes

Multiple Date Perspectives

  • Fact has Order_Date, Ship_Date, Invoice_Date
  • Can join on different dates
  • Example: "Supplier state when ordered" vs "when shipped"

The Retroactive Change Advantage

Type 7’s biggest benefit appears when dimensions evolve:

✅ Handling Dimension Changes

Scenario

  • Dimension tracks Supplier_State
  • Business adds Sales_Rep attribute
  • Sales_Rep has different change dates

Type 2 Problem

  • Must create new time slices
  • Existing facts point to old keys
  • Must update fact table
  • Expensive operation

Type 7 Solution

  • Add Sales_Rep to dimension
  • Create new time slices
  • Facts still have natural key
  • No fact table updates needed
  • Join logic handles new structure

The natural key provides stability. Even when the dimension’s time slices change, facts can still join correctly using the natural key and date logic.

Type 7 Challenges

The dual key approach has significant drawbacks:

🚫 Type 7 Problems

No True Referential Integrity

  • Natural key not unique in dimension
  • Cannot create foreign key on natural key
  • Surrogate key foreign key only ensures key exists
  • Doesn't ensure correct version
  • Data integrity relies on application logic

Query Complexity

  • Users must understand which key to use
  • Wrong choice gives wrong results
  • Date logic required for some queries
  • More complex than Type 2

Duplicate Row Risk

  • Natural key join without date filter returns multiple rows
  • Easy to get wrong results
  • Must remember WHERE Current_Flag = 'Y'
  • Or date range filter

Performance Concerns

  • Date comparisons slower than key joins
  • Natural key joins may be slower
  • More complex execution plans
  • Index strategy critical

BI Tool Limitations

  • Tools expect single join key
  • May not handle dual key logic well
  • Custom SQL often required
  • Reporting complexity

The Referential Integrity Problem

Type 7’s most serious issue is the inability to enforce referential integrity:

🚫 Referential Integrity Gap

The Problem

Supplier_Code appears multiple times in dimension (once per version). Database cannot create foreign key constraint on non-unique column.

Consequences

  • Can insert Supplier_Code that doesn't exist
  • Can insert valid code but wrong Supplier_Key
  • Orphaned facts possible
  • Data quality depends on ETL

Example

Fact has Supplier_Code = 'ABC' and Supplier_Key = 999. Supplier_Key 999 doesn't exist, but Supplier_Code 'ABC' does. Foreign key on Supplier_Key would catch this, but doesn't validate the natural key relationship.

Mitigation

  • ETL validation critical
  • Regular data quality checks
  • Application-level constraints
  • Cannot rely on database

When Type 7 Makes Sense

Type 7 suits specific scenarios:

✅ Good Type 7 Use Cases

Evolving Dimensions

  • Dimension structure changes frequently
  • New attributes added regularly
  • Retroactive changes common
  • Fact table updates expensive

Multiple Date Perspectives

  • Facts have multiple dates
  • Different dates need different dimension states
  • Example: Order date vs ship date
  • Flexibility worth complexity

Advanced Users

  • Team understands SCD concepts
  • Can write correct join logic
  • Data quality processes mature
  • BI tool supports complex joins

Type 7 Best Practices

Type 7 requires careful implementation:

💡 Type 7 Best Practices

Comprehensive Documentation

  • Explain dual key purpose
  • Provide query templates
  • Document join patterns
  • Include anti-patterns (what not to do)

ETL Validation

  • Verify both keys match
  • Check natural key exists
  • Validate surrogate key points to correct version
  • Automated data quality checks

Index Strategy

  • Index both keys in fact table
  • Index natural key in dimension
  • Index Start_Date and End_Date
  • Monitor query performance

Query Templates

  • Provide standard queries
  • Current state template
  • Historical state template
  • Point-in-time template
  • Prevent common errors

Consider Alternatives

  • Type 2 might be sufficient
  • Type 6 provides similar flexibility
  • Simpler approaches often better
  • Complexity must be justified

Type 7 vs Type 6

Both provide flexibility but through different mechanisms:

🔍 Type 7 vs Type 6 Comparison

Type 6

  • Stores current state in dimension
  • Updates all rows on change
  • Simple queries (no date logic)
  • Higher storage overhead
  • More complex ETL

Type 7

  • Stores natural key in facts
  • No dimension updates needed
  • Complex queries (date logic required)
  • Lower storage overhead
  • Simpler ETL, complex queries

Decision Factors

  • Query simplicity needed? → Type 6
  • Dimension changes frequently? → Type 7
  • Storage constrained? → Type 7
  • Non-technical users? → Type 6

Type 7 trades query simplicity for ETL simplicity and resilience to dimension changes. The right choice depends on which complexity is more manageable for your team.

Choosing the Right Type

Different dimensions and attributes require different SCD types.

Decision Framework

Selecting the appropriate SCD type requires evaluating multiple factors:

🎯 SCD Type Selection Criteria

Historical Accuracy Requirements

  • Regulatory compliance needs?
  • Audit trail required?
  • Historical reporting critical?
  • → Type 2, Type 4, or Type 6

Query Patterns

  • Mostly current state queries?
  • Mostly historical queries?
  • Both equally important?
  • → Influences Type 6 vs Type 2

Change Frequency

  • Changes rare?
  • Changes frequent?
  • Predictable or unpredictable?
  • → Affects storage and performance

Dimension Stability

  • Structure changes often?
  • New attributes added regularly?
  • Retroactive changes needed?
  • → Type 7 for unstable dimensions

Team Capabilities

  • ETL expertise level?
  • Query complexity tolerance?
  • BI tool limitations?
  • → Simpler types for less experienced teams

Type Selection Matrix

A practical guide for choosing SCD types:

Requirement Recommended Type Alternative Avoid
No history needed Type 1 Type 0 Type 2+
Complete history required Type 2 Type 4, Type 6 Type 1, Type 3
Before/after comparison only Type 3 Type 2 with view Type 1
Current + historical views Type 6 Type 7, Type 4 Type 1, Type 3
Frequent dimension changes Type 7 Type 2 Type 6
Simple queries critical Type 1, Type 6 Type 3 Type 7
Storage constrained Type 1, Type 3 Type 2 Type 6
Audit compliance Type 2, Type 4 Type 6 Type 1

Hybrid Approaches

Most data warehouses use multiple SCD types:

✅ Mixing SCD Types

Different Types per Dimension

  • Customer: Type 2 (address changes matter)
  • Product: Type 1 (description updates don't)
  • Date: Type 0 (never changes)
  • Employee: Type 2 (department transfers tracked)

Different Types per Attribute

  • Customer.Address: Type 2
  • Customer.Phone: Type 1
  • Customer.Email: Type 1
  • Customer.CreditLimit: Type 2

Benefits

  • Optimize each dimension independently
  • Balance complexity and requirements
  • Avoid over-engineering
  • Practical approach

Common Patterns by Industry

Different industries have typical SCD patterns:

🏢 Industry Patterns

Financial Services

  • Account dimensions: Type 2
  • Customer dimensions: Type 2
  • Product dimensions: Type 2
  • Reason: Regulatory compliance, audit requirements

Retail

  • Product dimensions: Type 1 or Type 2
  • Store dimensions: Type 2
  • Customer dimensions: Type 1 or Type 2
  • Reason: Balance history with simplicity

Healthcare

  • Patient dimensions: Type 2
  • Provider dimensions: Type 2
  • Diagnosis dimensions: Type 0
  • Reason: Historical accuracy critical

Manufacturing

  • Supplier dimensions: Type 2
  • Product dimensions: Type 2
  • Plant dimensions: Type 1
  • Reason: Supply chain tracking

Performance Considerations

SCD type affects query and ETL performance:

⚠️ Performance Impact

Type 1

  • Fast queries (no date logic)
  • Fast updates (simple UPDATE)
  • Must recalculate aggregates
  • Best query performance

Type 2

  • Moderate query performance
  • Date range checks add overhead
  • Dimension grows over time
  • Index strategy critical

Type 6

  • Simple queries (no date logic)
  • Slow updates (multiple rows)
  • Larger dimension size
  • Trade update speed for query speed

Type 7

  • Complex queries (date logic)
  • Fast updates (no dimension changes)
  • Natural key joins may be slower
  • Depends on index quality

Storage Considerations

Different types have different storage profiles:

💾 Storage Impact

Type 1

  • Minimal storage
  • One row per entity
  • No growth over time
  • Most efficient

Type 2

  • Grows with changes
  • One row per version
  • Predictable growth rate
  • Monitor dimension size

Type 3

  • Fixed size
  • Additional columns
  • No row multiplication
  • Moderate storage

Type 6

  • Largest storage
  • Type 2 rows + redundant columns
  • Highest overhead
  • Storage cost for query simplicity

Migration Strategies

Changing SCD types requires planning:

💡 SCD Type Migration

Type 1 to Type 2

  • Add Start_Date, End_Date, Current_Flag
  • Set Start_Date to earliest known date
  • Set End_Date to NULL
  • Future changes use Type 2
  • Historical data lost (unavoidable)

Type 2 to Type 6

  • Add Current_State column
  • Populate from current row
  • Update ETL to maintain both
  • Backward compatible

Type 2 to Type 1

  • Delete historical rows
  • Keep only current rows
  • Remove temporal columns
  • Irreversible (backup first)

Anti-Patterns to Avoid

Common SCD mistakes:

🚫 SCD Anti-Patterns

Using Type 2 for Everything

  • Not all attributes need history
  • Unnecessary complexity
  • Wasted storage
  • Solution: Mix types appropriately

Using Type 1 for Everything

  • Loses valuable history
  • Cannot answer temporal questions
  • Compliance issues
  • Solution: Identify critical historical attributes

Inconsistent Implementation

  • Different date formats
  • Mixed NULL vs high date
  • Inconsistent flag values
  • Solution: Standardize across warehouse

Ignoring Natural Keys

  • Only using surrogate keys
  • Cannot query across versions
  • Difficult to trace entity history
  • Solution: Maintain natural keys

Over-Engineering

  • Type 6 or Type 7 when Type 2 suffices
  • Complexity without benefit
  • Maintenance burden
  • Solution: Start simple, add complexity only when needed

Practical Recommendations

Guidelines for real-world implementation:

💡 Practical SCD Advice

Start with Type 2

  • Default choice for most dimensions
  • Well understood
  • Good balance of features
  • Can evolve to other types

Use Type 1 Sparingly

  • Only when history truly doesn't matter
  • Document the decision
  • Get business sign-off
  • Consider audit table as backup

Avoid Type 6 and Type 7 Initially

  • Complex to implement
  • Complex to maintain
  • Add only when Type 2 proves insufficient
  • Requires mature team

Document Everything

  • Which type for each dimension
  • Why that type was chosen
  • Query patterns and examples
  • ETL logic and edge cases

Monitor and Adjust

  • Track dimension growth
  • Monitor query performance
  • Gather user feedback
  • Be willing to change types

Conclusion

Slowly Changing Dimensions represent one of data warehousing’s fundamental challenges: balancing historical accuracy with practical implementation constraints. The choice between SCD types isn’t about finding the “best” approach—it’s about matching technique to business requirements, team capabilities, and system constraints.

The journey through SCD types reveals important lessons:

Simplicity Has Costs: Type 1’s simplicity comes at the price of lost history. What seems like a minor trade-off during design becomes a major limitation when business needs evolve. The question “what were our sales in California in 2005?” becomes unanswerable when suppliers relocate and Type 1 overwrites their locations. Historical data, once lost, cannot be recovered.

History Requires Structure: Type 2’s popularity stems from its balanced approach. Complete history preservation, reasonable query complexity, and well-understood implementation make it the default choice. The pattern of creating new rows for changes, maintaining temporal columns, and using surrogate keys has proven itself across countless data warehouses.

Flexibility Demands Complexity: Type 6 and Type 7 provide maximum flexibility but require sophisticated ETL processes and careful query design. The complexity is justified only when simpler approaches prove insufficient. Starting with Type 2 and evolving to more complex types as needs emerge prevents premature optimization.

Context Matters: No single SCD type fits all scenarios. Customer addresses might need Type 2 for historical accuracy while phone numbers use Type 1 for simplicity. Product categories might require Type 2 while product descriptions use Type 1. The ability to mix types within a single data warehouse—even within a single dimension—enables practical solutions.

Trade-offs Are Inevitable: Every SCD type represents trade-offs between competing concerns:

  • Historical accuracy vs. query simplicity
  • Storage efficiency vs. temporal precision
  • ETL complexity vs. query flexibility
  • Current state access vs. historical state access

Understanding these trade-offs enables informed decisions rather than following patterns blindly.

Implementation Discipline: Successful SCD implementation requires discipline. Consistent date handling, clear naming conventions, comprehensive documentation, and robust ETL processes separate working implementations from problematic ones. The technical pattern matters less than the rigor of its application.

The SCD type classification system provides a vocabulary for discussing dimensional change strategies. Type 0 through Type 7 aren’t rigid rules but rather a toolkit. The art lies in selecting the right tool for each situation, combining techniques when appropriate, and maintaining the discipline to implement chosen approaches consistently.

Modern data warehouses often use Type 2 as the foundation, adding Type 1 for attributes where history doesn’t matter and Type 6 or Type 7 only when specific requirements justify the additional complexity. This pragmatic approach balances the competing demands of historical accuracy, query performance, storage efficiency, and maintainability.

The ultimate measure of success isn’t which SCD type you choose but whether your dimensional model enables the business to answer its questions accurately. When historical reports reflect actual historical state, when current reports show current state, and when the distinction between the two is clear and intentional, the SCD implementation succeeds regardless of which specific types it employs.

As data warehouses evolve and business requirements change, SCD strategies may need adjustment. The flexibility to migrate between types, the discipline to document decisions, and the wisdom to choose simplicity over complexity when possible determine long-term success more than any specific technical choice.

Share