- The Historical Data Problem
- SCD Type Classification
- Type 0: Retain Original
- Type 1: Overwrite
- Type 2: Add New Row
- Type 3: Add New Attribute
- Type 4: Add History Table
- Type 6: Combined Approach
- Type 7: Dual Key Strategy
- Choosing the Right Type
- Conclusion
Data warehouses promise historical analysis, but dimensional data refuses to stay still. A customer relocates. A product changes categories. A salesperson transfers to a new region. These changes seem simple until you ask: should historical reports reflect the old state or the new one?
Traditional databases handle changes through updates—overwrite the old value with the new. This works for transactional systems focused on current state, but destroys the historical context that makes data warehouses valuable. When a salesperson moves from California to Illinois, should their past sales appear under California or Illinois in regional reports?
Slowly Changing Dimensions (SCDs) address this challenge. Unlike rapidly changing transactional data—customer IDs, quantities, prices—dimensional attributes like locations, names, and categories change infrequently but unpredictably. The “slowly” in SCD doesn’t mean changes are rare; it means they’re not part of normal transaction flow.
Ralph Kimball’s Data Warehouse Toolkit popularized a classification system for handling these changes: Types 0 through 7. Each type represents a different trade-off between historical accuracy, data complexity, and query performance. Type 1 sacrifices history for simplicity. Type 2 preserves complete history at the cost of complexity. Type 6 combines multiple approaches for maximum flexibility.
The challenge extends beyond technical implementation. SCDs must maintain referential integrity between fact and dimension tables. A sales fact table links to a supplier dimension. When the supplier relocates, historical sales must still connect to the correct supplier information without breaking database relationships.
This article explores each SCD type, examines their trade-offs, and provides guidance on choosing the right approach for different scenarios.
The Historical Data Problem
Before exploring solutions, understanding the problem reveals why dimensional changes create complexity.
The Supplier Relocation Scenario
Consider a data warehouse tracking supplier relationships:
📊 Initial State
Supplier Dimension
- Supplier: Acme Supply Co
- Location: California
- Supplier Code: ABC
Fact Table
- 2005 Sales: $500,000
- 2006 Sales: $750,000
Business Question "What were our California supplier sales in 2005?"
The answer seems obvious: $500,000. But when Acme relocates to Illinois in 2007, the question becomes ambiguous. Should the report show:
- Current State: Zero California sales (Acme is now in Illinois)
- Historical State: $500,000 (Acme was in California in 2005)
- Both: Depends on the query
Different business needs require different answers. Tax reporting needs historical accuracy—sales occurred in California regardless of current location. Strategic planning might want current state—understanding current supplier distribution matters more than past locations.
The Referential Integrity Challenge
Database relationships complicate matters further:
🚫 The Update Problem
Scenario
- Fact table references Supplier_Key = 123
- Supplier 123 is "Acme Supply Co, CA"
- Supplier relocates to Illinois
Option 1: Update in Place
- Change Supplier 123 to "Acme Supply Co, IL"
- Historical facts now show Illinois
- Lost: Historical accuracy
Option 2: Create New Record
- Add Supplier 124 as "Acme Supply Co, IL"
- Keep Supplier 123 as "Acme Supply Co, CA"
- Problem: Which key for new transactions?
- Problem: How to query "all Acme transactions"?
Neither option satisfies all requirements. Updates destroy history. New records break the natural key relationship. SCDs provide structured approaches to this dilemma.
Why “Slowly Changing” Matters
The distinction between slowly and rapidly changing data is crucial:
🔄 Change Frequency Comparison
Rapidly Changing (Transactional)
- Customer ID in each order
- Product ID in each sale
- Quantity, price, date
- Changes: Millions per day
- Strategy: Store each transaction
Slowly Changing (Dimensional)
- Customer address
- Product category
- Supplier location
- Changes: Dozens per month
- Strategy: Depends on business needs
Transactional data changes are expected and designed for. Dimensional changes are exceptions that require special handling. The “slowly” qualifier indicates these changes happen outside normal transaction flow, making them harder to anticipate and manage.
SCD Type Classification
Ralph Kimball’s classification system provides a structured approach to handling dimensional changes.
The Type System Overview
📋 SCD Types at a Glance
Type 0: Retain Original
- Never changes
- Original values preserved
- Example: Date of birth
Type 1: Overwrite
- Replace old with new
- No history maintained
- Simplest approach
Type 2: Add New Row
- Create new record for each change
- Complete history preserved
- Most common approach
Type 3: Add New Attribute
- Store limited history in columns
- Tracks previous value only
- Simple queries
Type 4: Add History Table
- Separate table for history
- Current and historical data split
- Audit-style tracking
Type 6: Combined (1+2+3)
- Hybrid approach
- Multiple tracking methods
- Maximum flexibility
Type 7: Dual Key
- Both surrogate and natural keys
- Flexible query options
- Complex implementation
Each type represents different priorities. Type 1 prioritizes simplicity over history. Type 2 prioritizes history over simplicity. Type 6 attempts to provide both at the cost of complexity.
Choosing Factors
The right type depends on specific requirements:
🎯 Decision Factors
Historical Accuracy
- How important is past state?
- Regulatory requirements?
- Audit trail needed?
Query Patterns
- Current state queries?
- Historical state queries?
- Both types needed?
Data Volume
- How often do changes occur?
- Storage constraints?
- Performance requirements?
Complexity Tolerance
- Team expertise?
- Maintenance burden?
- ETL complexity acceptable?
No single type fits all scenarios. Many data warehouses use different types for different dimensions, or even different types for different attributes within the same dimension.
Type 0: Retain Original
Some attributes never change by definition.
Immutable Attributes
Type 0 applies to attributes with durable values:
✅ Type 0 Characteristics
Definition
- Attribute never changes
- Original value preserved forever
- No special handling needed
Common Examples
- Date of birth
- Original credit score
- Account creation date
- Initial classification
Implementation
- Standard column
- No update logic
- Simplest possible approach
Type 0 isn’t really a “slowly changing” dimension—it’s a “never changing” dimension. The classification exists to explicitly identify attributes that should never be updated, preventing accidental modifications.
Date Dimensions
Date dimensions typically use Type 0 for most attributes:
CREATE TABLE Date_Dimension (
Date_Key INT PRIMARY KEY,
Full_Date DATE,
Day_Of_Week VARCHAR(10),
Month_Name VARCHAR(10),
Quarter INT,
Year INT,
Is_Holiday BOOLEAN
);
📅 Date Dimension Stability
Why Type 0 Works
- January 1, 2006 is always January 1, 2006
- Day of week never changes for a given date
- Quarter and year are fixed
Exception: Is_Holiday
- Holiday designations can change
- New holidays declared
- Might need Type 1 or Type 2
Date dimensions demonstrate Type 0’s simplicity. Once populated, they rarely need updates. This stability makes them ideal for fact table joins and time-based analysis.
Business Value
Type 0 provides clarity and protection:
✅ Type 0 Benefits
Clarity
- Explicitly marks immutable data
- Documents business rules
- Prevents confusion
Protection
- No accidental updates
- Data integrity guaranteed
- Audit compliance
Performance
- No version checking
- Simple queries
- Minimal storage
Identifying Type 0 attributes during design prevents future problems. When a business rule states “this never changes,” Type 0 enforces that rule at the data model level.
Type 1: Overwrite
The simplest approach: replace old values with new ones.
How Type 1 Works
Type 1 updates records in place:
Initial State:
| Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State |
|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA |
After Relocation:
| Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State |
|---|---|---|---|
| 123 | ABC | Acme Supply Co | IL |
UPDATE Supplier
SET Supplier_State = 'IL'
WHERE Supplier_Code = 'ABC';
The old value disappears. The new value takes its place. No history, no complexity.
When Type 1 Makes Sense
Type 1 works for specific scenarios:
✅ Good Type 1 Use Cases
Error Corrections
- Typos in data entry
- Incorrect initial values
- Data quality fixes
- No history needed for mistakes
Irrelevant Changes
- Changes that don't affect analysis
- Cosmetic updates
- Standardization changes
- Example: "St." to "Street"
Current State Only
- No historical reporting needed
- Current snapshot sufficient
- Storage constraints
- Simple reporting requirements
A supplier’s phone number might use Type 1. Historical phone numbers rarely matter for analysis. When the number changes, overwrite it. Reports always show current contact information.
The Cost of Simplicity
Type 1’s simplicity comes with significant limitations:
🚫 Type 1 Limitations
Lost History
- Cannot answer "what was the value in 2005?"
- Audit trail destroyed
- Compliance issues
- Irreversible data loss
Aggregate Impact
- Pre-calculated aggregates become incorrect
- Must recalculate summaries
- Example: "Sales by State" changes retroactively
- Performance implications
Reporting Confusion
- Historical reports show current values
- "Sales in California" shows zero after relocation
- Business users confused
- Trust in data erodes
Aggregate Recalculation Problem
Type 1 creates cascading updates:
⚠️ Aggregate Maintenance
Scenario
- Aggregate table: Sales by Supplier State
- California total: $5,000,000
- Illinois total: $2,000,000
After Type 1 Update
- Supplier moves from CA to IL
- Aggregate table now incorrect
- Must recalculate:
- Subtract from California
- Add to Illinois
Problem
- Expensive recalculation
- Multiple aggregate tables affected
- ETL complexity increases
- Defeats purpose of aggregates
Pre-calculated aggregates exist for performance. Type 1 changes force recalculation, negating the performance benefit. This makes Type 1 unsuitable for dimensions used in aggregate tables.
Implementation Considerations
Type 1 requires careful planning:
💡 Type 1 Best Practices
Document Decisions
- Explicitly mark Type 1 attributes
- Explain why history isn't needed
- Get business sign-off
Audit Trail Alternative
- Consider database audit logs
- Separate audit table
- Compromise: simple updates, external history
Hybrid Approach
- Type 1 for some attributes
- Type 2 for others
- Same dimension, different strategies
Type 1 works best when combined with other types. Use Type 1 for attributes where history truly doesn’t matter, and Type 2 for attributes where it does.
Type 2: Add New Row
The most common SCD approach: create a new record for each change.
How Type 2 Works
Type 2 preserves complete history through multiple records:
Initial State:
| Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State | Version |
|---|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA | 0 |
After First Relocation:
| Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State | Version |
|---|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA | 0 |
| 124 | ABC | Acme Supply Co | IL | 1 |
Each change creates a new row with a new surrogate key. The natural key (Supplier_Code) remains the same, but the surrogate key (Supplier_Key) changes.
Implementation Variations
Type 2 has several implementation patterns:
🔧 Type 2 Implementation Options
Version Numbers
- Sequential version column
- Simple to understand
- Easy to find latest
Effective Dates
- Start_Date and End_Date columns
- Precise temporal tracking
- Supports point-in-time queries
Current Flag
- Boolean indicating current record
- Fast current-state queries
- Often combined with dates
Effective Date Implementation:
| Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State | Start_Date | End_Date |
|---|---|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA | 2000-01-01 | 2004-12-22 |
| 124 | ABC | Acme Supply Co | IL | 2004-12-22 | NULL |
The NULL End_Date indicates the current record. Some implementations use a high date value (9999-12-31) instead of NULL to avoid null handling in queries.
Current Flag Implementation:
| Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State | Effective_Date | Current_Flag |
|---|---|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA | 2000-01-01 | N |
| 124 | ABC | Acme Supply Co | IL | 2004-12-22 | Y |
The Current_Flag enables fast filtering: WHERE Current_Flag = 'Y' returns only current records.
Fact Table Integration
Type 2’s power comes from fact table integration:
✅ Type 2 Benefits
Historical Accuracy
- Facts link to correct historical dimension
- Transaction from 2003 links to Supplier_Key 123 (CA)
- Transaction from 2005 links to Supplier_Key 124 (IL)
- Reports show accurate historical state
No Aggregate Updates
- Aggregates remain correct
- "Sales by State" doesn't change retroactively
- Pre-calculated summaries stay valid
- Performance maintained
Unlimited History
- Every change tracked
- Complete audit trail
- Supports compliance requirements
- Time-travel queries possible
When a fact table stores Supplier_Key, it captures the dimension state at transaction time. A sale in 2003 references Supplier_Key 123 (California). That relationship never changes, even when the supplier relocates. Historical reports automatically show the correct state.
Query Patterns
Type 2 supports multiple query patterns:
Current State Query:
SELECT
s.Supplier_Name,
s.Supplier_State,
SUM(f.Sales_Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Supplier_Key
WHERE s.Current_Flag = 'Y'
GROUP BY s.Supplier_Name, s.Supplier_State;
Historical State Query:
SELECT
s.Supplier_Name,
s.Supplier_State,
SUM(f.Sales_Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Supplier_Key
WHERE f.Sale_Date BETWEEN s.Start_Date AND s.End_Date
GROUP BY s.Supplier_Name, s.Supplier_State;
Point-in-Time Query:
SELECT
s.Supplier_Name,
s.Supplier_State
FROM Supplier s
WHERE s.Supplier_Code = 'ABC'
AND '2003-06-15' BETWEEN s.Start_Date AND COALESCE(s.End_Date, '9999-12-31');
Type 2 Challenges
Despite its popularity, Type 2 has drawbacks:
⚠️ Type 2 Challenges
Dimension Size Growth
- Each change adds a row
- Frequently changing dimensions grow large
- Storage implications
- Index maintenance overhead
ETL Complexity
- Must detect changes
- Close old record (set End_Date)
- Create new record
- Update Current_Flag
- More complex than Type 1
Natural Key Queries
- Multiple rows per natural key
- Must specify time period or current flag
- Joins become more complex
- Risk of duplicate results
Retroactive Changes
- Adding new attributes is difficult
- Different effective dates for different attributes
- May require reprocessing facts
- Expensive operation
Retroactive Change Problem
Type 2’s biggest challenge appears when the dimension model changes:
🚫 Retroactive Change Scenario
Situation
- Dimension tracks Supplier_State
- Business adds requirement: track Sales_Rep
- Sales_Rep has different effective dates than State
Problem
- Existing rows have State changes
- Need to add Sales_Rep changes
- Effective dates don't align
- Must create new time slices
Impact
- Existing fact records point to old keys
- Must update fact table
- Expensive operation
- Potential data inconsistency
This scenario makes Type 2 unsuitable for dimensions subject to frequent structural changes. When the model evolves, the cost of maintaining Type 2 can become prohibitive.
Best Practices
Type 2 works best with discipline:
💡 Type 2 Best Practices
Surrogate Keys Required
- Never use natural keys in facts
- Surrogate keys enable versioning
- Simplifies key management
Consistent Date Handling
- Standardize on NULL or high date
- Document convention
- Enforce in ETL
Index Strategy
- Index Current_Flag for current queries
- Index Start_Date and End_Date for temporal queries
- Index natural key for lookups
ETL Change Detection
- Compare source to current dimension
- Detect changes efficiently
- Batch updates for performance
Type 2 is the default choice for most slowly changing dimensions. Its combination of historical accuracy and query flexibility makes it suitable for a wide range of scenarios.
Type 3: Add New Attribute
Type 3 tracks limited history through additional columns.
How Type 3 Works
Instead of adding rows, Type 3 adds columns:
Initial State:
| Supplier_Key | Supplier_Code | Supplier_Name | Current_State |
|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA |
After Adding Type 3 Tracking:
| Supplier_Key | Supplier_Code | Supplier_Name | Original_State | Effective_Date | Current_State |
|---|---|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA | 2004-12-22 | IL |
The table structure changes to accommodate history. Original_State preserves the first value. Current_State holds the latest value.
Limited History Trade-off
Type 3 provides fixed history depth:
📊 Type 3 Characteristics
What It Tracks
- Original value
- Current value
- One transition date
- No intermediate changes
Limitations
- Only two states visible
- Second relocation overwrites first change
- Cannot track multiple transitions
- Fixed column structure
Benefits
- Simple queries
- No row multiplication
- Fixed table size
- Easy to understand
Query Simplicity
Type 3 enables straightforward queries:
-- Current state analysis
SELECT
Current_State,
SUM(Sales_Amount) as Total_Sales
FROM Supplier s
JOIN Fact_Sales f ON s.Supplier_Key = f.Supplier_Key
GROUP BY Current_State;
-- Original state analysis
SELECT
Original_State,
SUM(Sales_Amount) as Total_Sales
FROM Supplier s
JOIN Fact_Sales f ON s.Supplier_Key = f.Supplier_Key
GROUP BY Original_State;
-- Transition analysis
SELECT
Original_State,
Current_State,
COUNT(*) as Suppliers_Moved
FROM Supplier
WHERE Original_State != Current_State
GROUP BY Original_State, Current_State;
No date range checks. No current flags. Simple column references.
When Type 3 Makes Sense
Type 3 works for specific analytical needs:
✅ Good Type 3 Use Cases
Before/After Analysis
- Compare original vs current
- Migration tracking
- Example: "Customers who moved from urban to rural"
Fixed Transitions
- One expected change
- Example: Product launch status (Beta → Released)
- Example: Customer tier (Standard → Premium)
Simple Reporting
- Business wants two views only
- Current and original sufficient
- No intermediate states needed
Previous Value Variation
An alternative tracks the most recent change:
| Supplier_Key | Supplier_Code | Supplier_Name | Previous_State | Change_Date | Current_State |
|---|---|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA | 2004-12-22 | IL |
After Second Relocation:
| Supplier_Key | Supplier_Code | Supplier_Name | Previous_State | Change_Date | Current_State |
|---|---|---|---|---|---|
| 123 | ABC | Acme Supply Co | IL | 2008-02-04 | NY |
Previous_State now shows IL (not CA). This variation tracks the last transition, losing earlier history.
Type 3 Limitations
The fixed structure creates problems:
🚫 Type 3 Problems
History Loss
- Third change overwrites second
- Cannot reconstruct full history
- Audit trail incomplete
Schema Changes
- Adding more history requires ALTER TABLE
- Application changes needed
- Disruptive to production
Temporal Queries Impossible
- Cannot ask "what was the value on date X?"
- Only two points in time available
- Limited analytical value
Fact Table Ambiguity
- Which state applies to a given transaction?
- Must compare transaction date to Effective_Date
- More complex than Type 2
Hybrid Column Approach
Type 3 often combines with other types:
💡 Type 3 Hybrid Strategy
Scenario
- Use Type 2 for complete history
- Add Type 3 columns for common queries
- Example: Current_State column in Type 2 dimension
Benefits
- Type 2 provides full history
- Type 3 columns simplify frequent queries
- Best of both approaches
Trade-off
- Redundant data
- ETL must maintain both
- Storage overhead
Type 3 rarely stands alone. Its limited history makes it unsuitable as the sole tracking mechanism. Combined with Type 2, it provides query convenience while maintaining complete history.
Type 4: Add History Table
Type 4 separates current and historical data into different tables.
How Type 4 Works
Type 4 uses two tables: one for current state, one for history:
Supplier Table (Current):
| Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State |
|---|---|---|---|
| 124 | ABC | Acme & Johnson Supply Co | IL |
Supplier_History Table:
| Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State | Create_Date |
|---|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA | 2003-06-14 |
| 124 | ABC | Acme & Johnson Supply Co | IL | 2004-12-22 |
The current table contains only active records. The history table contains all versions, including the current one.
Database Audit Pattern
Type 4 resembles database audit tables:
🔍 Type 4 Characteristics
Structure
- Current table: one row per entity
- History table: all versions
- Both tables share key structure
- History includes timestamps
Similarities to Audit Tables
- Tracks all changes
- Immutable history
- Timestamp-based
- Separate storage
Differences from Audit Tables
- History table is queryable dimension
- Fact tables can reference both
- Part of dimensional model
- Not just for auditing
Fact Table Integration
Type 4’s unique feature: facts can reference both tables:
CREATE TABLE Fact_Sales (
Sale_ID INT PRIMARY KEY,
Sale_Date DATE,
Current_Supplier_Key INT, -- References Supplier
Historical_Supplier_Key INT, -- References Supplier_History
Amount DECIMAL(10,2)
);
✅ Dual Reference Benefits
Current State Queries
- Join to Supplier table
- Always shows current information
- Fast queries (smaller table)
- No date filtering needed
Historical State Queries
- Join to Supplier_History table
- Shows state at transaction time
- Complete audit trail
- Temporal accuracy
Flexibility
- Choose appropriate table per query
- No query rewriting needed
- Both views available
Query Patterns
Type 4 enables different analytical perspectives:
Current State Analysis:
SELECT
s.Supplier_State,
SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Current_Supplier_Key = s.Supplier_Key
GROUP BY s.Supplier_State;
This query shows all sales grouped by current supplier state, regardless of when the sale occurred.
Historical State Analysis:
SELECT
sh.Supplier_State,
SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier_History sh ON f.Historical_Supplier_Key = sh.Supplier_Key
GROUP BY sh.Supplier_State;
This query shows sales grouped by the supplier’s state at the time of sale.
ETL Considerations
Type 4 requires careful ETL design:
⚠️ Type 4 ETL Complexity
Change Detection
- Compare source to current table
- Detect changes
- Update current table
- Insert into history table
Key Management
- Generate new surrogate key for change
- Update current table with new key
- Insert new row in history table
- Update fact table with both keys
Fact Table Loading
- Must populate both key columns
- Current key: latest dimension key
- Historical key: dimension key at transaction time
- Requires temporal lookup
When Type 4 Makes Sense
Type 4 suits specific scenarios:
✅ Good Type 4 Use Cases
Dual Perspective Reporting
- Need both current and historical views
- Frequent switching between perspectives
- Example: Sales by current territory vs historical territory
Performance Optimization
- Current table stays small
- Fast current-state queries
- History table can be partitioned
- Separate indexing strategies
Change Data Capture
- Integrating with CDC systems
- Natural fit for CDC output
- Audit trail requirement
- Compliance needs
Type 4 Challenges
The dual-table approach creates complexity:
🚫 Type 4 Problems
Fact Table Overhead
- Two key columns per dimension
- Increased storage
- More complex ETL
- Potential for inconsistency
Synchronization Risk
- Current and history tables must stay aligned
- History table should contain current record
- ETL failures can cause divergence
- Validation overhead
Query Confusion
- Users must understand which table to use
- Wrong table choice gives wrong results
- Documentation critical
- Training required
Limited Tool Support
- BI tools expect single dimension table
- May not handle dual references well
- Custom query logic needed
- Reporting complexity
Maintenance Strategy
Type 4 requires ongoing maintenance:
💡 Type 4 Best Practices
Consistency Checks
- Verify history contains current record
- Validate key relationships
- Regular reconciliation
- Automated validation
Clear Naming
- Obvious table names (Current vs History)
- Descriptive key column names
- Documentation in data dictionary
ETL Atomicity
- Update both tables in single transaction
- Rollback on failure
- Prevent partial updates
Performance Tuning
- Partition history table by date
- Different indexes for each table
- Archive old history if needed
Type 4 provides flexibility at the cost of complexity. It works best when the dual perspective is genuinely needed and the team can manage the additional ETL and maintenance burden.
Type 6: Combined Approach
Type 6 combines Types 1, 2, and 3 (1 + 2 + 3 = 6) for maximum flexibility.
How Type 6 Works
Type 6 uses Type 2’s row versioning with Type 3’s current value columns and Type 1’s overwrite strategy:
Initial State:
| Supplier_Key | Row_Key | Supplier_Code | Supplier_Name | Current_State | Historical_State | Start_Date | End_Date | Current_Flag |
|---|---|---|---|---|---|---|---|---|
| 123 | 1 | ABC | Acme Supply Co | CA | CA | 2000-01-01 | 9999-12-31 | Y |
After First Relocation:
| Supplier_Key | Row_Key | Supplier_Code | Supplier_Name | Current_State | Historical_State | Start_Date | End_Date | Current_Flag |
|---|---|---|---|---|---|---|---|---|
| 123 | 1 | ABC | Acme Supply Co | IL | CA | 2000-01-01 | 2004-12-22 | N |
| 123 | 2 | ABC | Acme Supply Co | IL | IL | 2004-12-22 | 9999-12-31 | Y |
Notice:
- Type 2: New row created (Row_Key 2)
- Type 3: Current_State column added
- Type 1: Current_State in Row_Key 1 overwritten to IL
After Second Relocation:
| Supplier_Key | Row_Key | Supplier_Code | Supplier_Name | Current_State | Historical_State | Start_Date | End_Date | Current_Flag |
|---|---|---|---|---|---|---|---|---|
| 123 | 1 | ABC | Acme Supply Co | NY | CA | 2000-01-01 | 2004-12-22 | N |
| 123 | 2 | ABC | Acme Supply Co | NY | IL | 2004-12-22 | 2008-02-04 | N |
| 123 | 3 | ABC | Acme Supply Co | NY | NY | 2008-02-04 | 9999-12-31 | Y |
All rows now show Current_State = NY (Type 1 overwrite), while Historical_State preserves the value that was current during each time period (Type 2 history).
The Three Techniques Combined
Type 6 applies each technique to different aspects:
🔧 Type 6 Mechanics
Type 1 Component
- Current_State column
- Overwritten on every change
- All rows updated to show latest value
- Enables "current state" queries
Type 2 Component
- New row for each change
- Start_Date and End_Date
- Current_Flag
- Preserves complete history
Type 3 Component
- Historical_State column
- Stores value that was current during time period
- Never updated after row creation
- Enables "historical state" queries
Query Flexibility
Type 6 supports three query patterns:
Current State Query:
SELECT
s.Current_State,
SUM(f.Sales_Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Row_Key
GROUP BY s.Current_State;
Shows all sales grouped by current supplier state. Simple join, no date filtering.
Historical State Query:
SELECT
s.Historical_State,
SUM(f.Sales_Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Row_Key
GROUP BY s.Historical_State;
Shows sales grouped by the state that was current when the sale occurred. Also simple, no date filtering needed.
Point-in-Time Query:
SELECT
s.Supplier_Name,
s.Historical_State
FROM Supplier s
WHERE s.Supplier_Code = 'ABC'
AND '2005-06-15' BETWEEN s.Start_Date AND s.End_Date;
Retrieves the dimension state as it existed on a specific date.
Kimball’s “Unpredictable Changes with Single-Version Overlay”
Ralph Kimball calls Type 6 “Unpredictable Changes with Single-Version Overlay”:
📚 Kimball's Terminology
Unpredictable Changes
- Changes occur irregularly
- Cannot predict when or how often
- Typical of slowly changing dimensions
Single-Version Overlay
- Current value "overlaid" on all historical rows
- Creates single version of current state
- Accessible without date logic
- Simplifies common queries
The “overlay” refers to the Current_State column that gets updated across all rows, providing a single, consistent view of the current state regardless of which historical row you’re examining.
Implementation Complexity
Type 6 is the most complex SCD type:
⚠️ Type 6 Complexity
ETL Challenges
- Detect changes (Type 2)
- Create new row (Type 2)
- Update End_Date on old row (Type 2)
- Update Current_Flag (Type 2)
- Update Current_State on ALL rows for entity (Type 1)
- Most complex update logic
Storage Overhead
- Redundant Current_State column
- Stored in every historical row
- Increases dimension size
- More storage than Type 2 alone
Update Performance
- Every change updates multiple rows
- Must update all historical rows
- Can be slow for entities with many versions
- Index maintenance overhead
When Type 6 Makes Sense
Type 6 justifies its complexity in specific scenarios:
✅ Good Type 6 Use Cases
Frequent Current State Queries
- Most queries need current state
- Historical state needed occasionally
- Query simplicity worth storage cost
Business User Queries
- Non-technical users write queries
- Date logic too complex
- Simple column reference preferred
- Self-service BI tools
Regulatory Requirements
- Must maintain complete history (Type 2)
- Must report current state (Type 1)
- Both requirements mandatory
- Compliance justifies complexity
Type 6 Best Practices
Managing Type 6 requires discipline:
💡 Type 6 Best Practices
Clear Column Naming
- Current_State vs Historical_State
- Obvious which is which
- Prevents query errors
- Self-documenting
Efficient Updates
- Batch Current_State updates
- Use set-based operations
- Avoid row-by-row updates
- Optimize for performance
Documentation
- Explain the three techniques
- Provide query examples
- Document ETL logic
- Train users
Consider Type 2 + View
- Alternative: Type 2 dimension
- Add view with current state
- View joins to current row
- Simpler ETL, similar query experience
The view alternative provides similar query simplicity without the update complexity:
CREATE VIEW Supplier_With_Current AS
SELECT
h.*,
c.Supplier_State as Current_State
FROM Supplier h
JOIN Supplier c ON h.Supplier_Code = c.Supplier_Code
WHERE c.Current_Flag = 'Y';
This view gives each historical row access to the current state without storing it redundantly.
Type 7: Dual Key Strategy
Type 7 places both surrogate and natural keys in the fact table for maximum query flexibility.
How Type 7 Works
Type 7 uses Type 2 dimension structure but stores two keys in facts:
Supplier Dimension (Type 2):
| Supplier_Key | Supplier_Code | Supplier_Name | Supplier_State | Start_Date | End_Date | Current_Flag |
|---|---|---|---|---|---|---|
| 123 | ABC | Acme Supply Co | CA | 2000-01-01 | 2004-12-22 | N |
| 124 | ABC | Acme Supply Co | IL | 2004-12-22 | 2008-02-04 | N |
| 125 | ABC | Acme Supply Co | NY | 2008-02-04 | 9999-12-31 | Y |
Fact Table with Dual Keys:
CREATE TABLE Fact_Sales (
Sale_ID INT PRIMARY KEY,
Sale_Date DATE,
Supplier_Key INT, -- Surrogate key (historical)
Supplier_Code VARCHAR(10), -- Natural key
Amount DECIMAL(10,2)
);
The fact stores both the surrogate key (Supplier_Key) that was current at transaction time and the natural key (Supplier_Code) that identifies the entity across all versions.
Query Flexibility
Type 7 enables three query patterns without date logic:
Historical State Query:
SELECT
s.Supplier_State,
SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Key = s.Supplier_Key
GROUP BY s.Supplier_State;
Join on surrogate key shows historical state. A sale from 2003 joins to Supplier_Key 123 (CA), showing California sales.
Current State Query:
SELECT
s.Supplier_State,
SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Code = s.Supplier_Code
WHERE s.Current_Flag = 'Y'
GROUP BY s.Supplier_State;
Join on natural key with current flag shows current state. All sales for ABC join to Supplier_Key 125 (NY), showing New York sales.
Point-in-Time Query:
SELECT
s.Supplier_State,
SUM(f.Amount) as Total_Sales
FROM Fact_Sales f
JOIN Supplier s ON f.Supplier_Code = s.Supplier_Code
WHERE f.Sale_Date BETWEEN s.Start_Date AND s.End_Date
GROUP BY s.Supplier_State;
Join on natural key with date range shows state at any point in time.
Advantages Over Type 2
Type 7 adds flexibility to Type 2:
✅ Type 7 Benefits
Query Options
- Historical: join on surrogate key
- Current: join on natural key + current flag
- Point-in-time: join on natural key + date range
- No query rewriting needed
Referential Integrity
- Surrogate key can have foreign key constraint
- Natural key provides logical relationship
- Both perspectives supported
Retroactive Changes
- Adding attributes doesn't break facts
- New time slices don't require fact updates
- More resilient to dimension changes
Multiple Date Perspectives
- Fact has Order_Date, Ship_Date, Invoice_Date
- Can join on different dates
- Example: "Supplier state when ordered" vs "when shipped"
The Retroactive Change Advantage
Type 7’s biggest benefit appears when dimensions evolve:
✅ Handling Dimension Changes
Scenario
- Dimension tracks Supplier_State
- Business adds Sales_Rep attribute
- Sales_Rep has different change dates
Type 2 Problem
- Must create new time slices
- Existing facts point to old keys
- Must update fact table
- Expensive operation
Type 7 Solution
- Add Sales_Rep to dimension
- Create new time slices
- Facts still have natural key
- No fact table updates needed
- Join logic handles new structure
The natural key provides stability. Even when the dimension’s time slices change, facts can still join correctly using the natural key and date logic.
Type 7 Challenges
The dual key approach has significant drawbacks:
🚫 Type 7 Problems
No True Referential Integrity
- Natural key not unique in dimension
- Cannot create foreign key on natural key
- Surrogate key foreign key only ensures key exists
- Doesn't ensure correct version
- Data integrity relies on application logic
Query Complexity
- Users must understand which key to use
- Wrong choice gives wrong results
- Date logic required for some queries
- More complex than Type 2
Duplicate Row Risk
- Natural key join without date filter returns multiple rows
- Easy to get wrong results
- Must remember WHERE Current_Flag = 'Y'
- Or date range filter
Performance Concerns
- Date comparisons slower than key joins
- Natural key joins may be slower
- More complex execution plans
- Index strategy critical
BI Tool Limitations
- Tools expect single join key
- May not handle dual key logic well
- Custom SQL often required
- Reporting complexity
The Referential Integrity Problem
Type 7’s most serious issue is the inability to enforce referential integrity:
🚫 Referential Integrity Gap
The Problem
Supplier_Code appears multiple times in dimension (once per version). Database cannot create foreign key constraint on non-unique column.
Consequences
- Can insert Supplier_Code that doesn't exist
- Can insert valid code but wrong Supplier_Key
- Orphaned facts possible
- Data quality depends on ETL
Example
Fact has Supplier_Code = 'ABC' and Supplier_Key = 999. Supplier_Key 999 doesn't exist, but Supplier_Code 'ABC' does. Foreign key on Supplier_Key would catch this, but doesn't validate the natural key relationship.
Mitigation
- ETL validation critical
- Regular data quality checks
- Application-level constraints
- Cannot rely on database
When Type 7 Makes Sense
Type 7 suits specific scenarios:
✅ Good Type 7 Use Cases
Evolving Dimensions
- Dimension structure changes frequently
- New attributes added regularly
- Retroactive changes common
- Fact table updates expensive
Multiple Date Perspectives
- Facts have multiple dates
- Different dates need different dimension states
- Example: Order date vs ship date
- Flexibility worth complexity
Advanced Users
- Team understands SCD concepts
- Can write correct join logic
- Data quality processes mature
- BI tool supports complex joins
Type 7 Best Practices
Type 7 requires careful implementation:
💡 Type 7 Best Practices
Comprehensive Documentation
- Explain dual key purpose
- Provide query templates
- Document join patterns
- Include anti-patterns (what not to do)
ETL Validation
- Verify both keys match
- Check natural key exists
- Validate surrogate key points to correct version
- Automated data quality checks
Index Strategy
- Index both keys in fact table
- Index natural key in dimension
- Index Start_Date and End_Date
- Monitor query performance
Query Templates
- Provide standard queries
- Current state template
- Historical state template
- Point-in-time template
- Prevent common errors
Consider Alternatives
- Type 2 might be sufficient
- Type 6 provides similar flexibility
- Simpler approaches often better
- Complexity must be justified
Type 7 vs Type 6
Both provide flexibility but through different mechanisms:
🔍 Type 7 vs Type 6 Comparison
Type 6
- Stores current state in dimension
- Updates all rows on change
- Simple queries (no date logic)
- Higher storage overhead
- More complex ETL
Type 7
- Stores natural key in facts
- No dimension updates needed
- Complex queries (date logic required)
- Lower storage overhead
- Simpler ETL, complex queries
Decision Factors
- Query simplicity needed? → Type 6
- Dimension changes frequently? → Type 7
- Storage constrained? → Type 7
- Non-technical users? → Type 6
Type 7 trades query simplicity for ETL simplicity and resilience to dimension changes. The right choice depends on which complexity is more manageable for your team.
Choosing the Right Type
Different dimensions and attributes require different SCD types.
Decision Framework
Selecting the appropriate SCD type requires evaluating multiple factors:
🎯 SCD Type Selection Criteria
Historical Accuracy Requirements
- Regulatory compliance needs?
- Audit trail required?
- Historical reporting critical?
- → Type 2, Type 4, or Type 6
Query Patterns
- Mostly current state queries?
- Mostly historical queries?
- Both equally important?
- → Influences Type 6 vs Type 2
Change Frequency
- Changes rare?
- Changes frequent?
- Predictable or unpredictable?
- → Affects storage and performance
Dimension Stability
- Structure changes often?
- New attributes added regularly?
- Retroactive changes needed?
- → Type 7 for unstable dimensions
Team Capabilities
- ETL expertise level?
- Query complexity tolerance?
- BI tool limitations?
- → Simpler types for less experienced teams
Type Selection Matrix
A practical guide for choosing SCD types:
| Requirement | Recommended Type | Alternative | Avoid |
|---|---|---|---|
| No history needed | Type 1 | Type 0 | Type 2+ |
| Complete history required | Type 2 | Type 4, Type 6 | Type 1, Type 3 |
| Before/after comparison only | Type 3 | Type 2 with view | Type 1 |
| Current + historical views | Type 6 | Type 7, Type 4 | Type 1, Type 3 |
| Frequent dimension changes | Type 7 | Type 2 | Type 6 |
| Simple queries critical | Type 1, Type 6 | Type 3 | Type 7 |
| Storage constrained | Type 1, Type 3 | Type 2 | Type 6 |
| Audit compliance | Type 2, Type 4 | Type 6 | Type 1 |
Hybrid Approaches
Most data warehouses use multiple SCD types:
✅ Mixing SCD Types
Different Types per Dimension
- Customer: Type 2 (address changes matter)
- Product: Type 1 (description updates don't)
- Date: Type 0 (never changes)
- Employee: Type 2 (department transfers tracked)
Different Types per Attribute
- Customer.Address: Type 2
- Customer.Phone: Type 1
- Customer.Email: Type 1
- Customer.CreditLimit: Type 2
Benefits
- Optimize each dimension independently
- Balance complexity and requirements
- Avoid over-engineering
- Practical approach
Common Patterns by Industry
Different industries have typical SCD patterns:
🏢 Industry Patterns
Financial Services
- Account dimensions: Type 2
- Customer dimensions: Type 2
- Product dimensions: Type 2
- Reason: Regulatory compliance, audit requirements
Retail
- Product dimensions: Type 1 or Type 2
- Store dimensions: Type 2
- Customer dimensions: Type 1 or Type 2
- Reason: Balance history with simplicity
Healthcare
- Patient dimensions: Type 2
- Provider dimensions: Type 2
- Diagnosis dimensions: Type 0
- Reason: Historical accuracy critical
Manufacturing
- Supplier dimensions: Type 2
- Product dimensions: Type 2
- Plant dimensions: Type 1
- Reason: Supply chain tracking
Performance Considerations
SCD type affects query and ETL performance:
⚠️ Performance Impact
Type 1
- Fast queries (no date logic)
- Fast updates (simple UPDATE)
- Must recalculate aggregates
- Best query performance
Type 2
- Moderate query performance
- Date range checks add overhead
- Dimension grows over time
- Index strategy critical
Type 6
- Simple queries (no date logic)
- Slow updates (multiple rows)
- Larger dimension size
- Trade update speed for query speed
Type 7
- Complex queries (date logic)
- Fast updates (no dimension changes)
- Natural key joins may be slower
- Depends on index quality
Storage Considerations
Different types have different storage profiles:
💾 Storage Impact
Type 1
- Minimal storage
- One row per entity
- No growth over time
- Most efficient
Type 2
- Grows with changes
- One row per version
- Predictable growth rate
- Monitor dimension size
Type 3
- Fixed size
- Additional columns
- No row multiplication
- Moderate storage
Type 6
- Largest storage
- Type 2 rows + redundant columns
- Highest overhead
- Storage cost for query simplicity
Migration Strategies
Changing SCD types requires planning:
💡 SCD Type Migration
Type 1 to Type 2
- Add Start_Date, End_Date, Current_Flag
- Set Start_Date to earliest known date
- Set End_Date to NULL
- Future changes use Type 2
- Historical data lost (unavoidable)
Type 2 to Type 6
- Add Current_State column
- Populate from current row
- Update ETL to maintain both
- Backward compatible
Type 2 to Type 1
- Delete historical rows
- Keep only current rows
- Remove temporal columns
- Irreversible (backup first)
Anti-Patterns to Avoid
Common SCD mistakes:
🚫 SCD Anti-Patterns
Using Type 2 for Everything
- Not all attributes need history
- Unnecessary complexity
- Wasted storage
- Solution: Mix types appropriately
Using Type 1 for Everything
- Loses valuable history
- Cannot answer temporal questions
- Compliance issues
- Solution: Identify critical historical attributes
Inconsistent Implementation
- Different date formats
- Mixed NULL vs high date
- Inconsistent flag values
- Solution: Standardize across warehouse
Ignoring Natural Keys
- Only using surrogate keys
- Cannot query across versions
- Difficult to trace entity history
- Solution: Maintain natural keys
Over-Engineering
- Type 6 or Type 7 when Type 2 suffices
- Complexity without benefit
- Maintenance burden
- Solution: Start simple, add complexity only when needed
Practical Recommendations
Guidelines for real-world implementation:
💡 Practical SCD Advice
Start with Type 2
- Default choice for most dimensions
- Well understood
- Good balance of features
- Can evolve to other types
Use Type 1 Sparingly
- Only when history truly doesn't matter
- Document the decision
- Get business sign-off
- Consider audit table as backup
Avoid Type 6 and Type 7 Initially
- Complex to implement
- Complex to maintain
- Add only when Type 2 proves insufficient
- Requires mature team
Document Everything
- Which type for each dimension
- Why that type was chosen
- Query patterns and examples
- ETL logic and edge cases
Monitor and Adjust
- Track dimension growth
- Monitor query performance
- Gather user feedback
- Be willing to change types
Conclusion
Slowly Changing Dimensions represent one of data warehousing’s fundamental challenges: balancing historical accuracy with practical implementation constraints. The choice between SCD types isn’t about finding the “best” approach—it’s about matching technique to business requirements, team capabilities, and system constraints.
The journey through SCD types reveals important lessons:
Simplicity Has Costs: Type 1’s simplicity comes at the price of lost history. What seems like a minor trade-off during design becomes a major limitation when business needs evolve. The question “what were our sales in California in 2005?” becomes unanswerable when suppliers relocate and Type 1 overwrites their locations. Historical data, once lost, cannot be recovered.
History Requires Structure: Type 2’s popularity stems from its balanced approach. Complete history preservation, reasonable query complexity, and well-understood implementation make it the default choice. The pattern of creating new rows for changes, maintaining temporal columns, and using surrogate keys has proven itself across countless data warehouses.
Flexibility Demands Complexity: Type 6 and Type 7 provide maximum flexibility but require sophisticated ETL processes and careful query design. The complexity is justified only when simpler approaches prove insufficient. Starting with Type 2 and evolving to more complex types as needs emerge prevents premature optimization.
Context Matters: No single SCD type fits all scenarios. Customer addresses might need Type 2 for historical accuracy while phone numbers use Type 1 for simplicity. Product categories might require Type 2 while product descriptions use Type 1. The ability to mix types within a single data warehouse—even within a single dimension—enables practical solutions.
Trade-offs Are Inevitable: Every SCD type represents trade-offs between competing concerns:
- Historical accuracy vs. query simplicity
- Storage efficiency vs. temporal precision
- ETL complexity vs. query flexibility
- Current state access vs. historical state access
Understanding these trade-offs enables informed decisions rather than following patterns blindly.
Implementation Discipline: Successful SCD implementation requires discipline. Consistent date handling, clear naming conventions, comprehensive documentation, and robust ETL processes separate working implementations from problematic ones. The technical pattern matters less than the rigor of its application.
The SCD type classification system provides a vocabulary for discussing dimensional change strategies. Type 0 through Type 7 aren’t rigid rules but rather a toolkit. The art lies in selecting the right tool for each situation, combining techniques when appropriate, and maintaining the discipline to implement chosen approaches consistently.
Modern data warehouses often use Type 2 as the foundation, adding Type 1 for attributes where history doesn’t matter and Type 6 or Type 7 only when specific requirements justify the additional complexity. This pragmatic approach balances the competing demands of historical accuracy, query performance, storage efficiency, and maintainability.
The ultimate measure of success isn’t which SCD type you choose but whether your dimensional model enables the business to answer its questions accurately. When historical reports reflect actual historical state, when current reports show current state, and when the distinction between the two is clear and intentional, the SCD implementation succeeds regardless of which specific types it employs.
As data warehouses evolve and business requirements change, SCD strategies may need adjustment. The flexibility to migrate between types, the discipline to document decisions, and the wisdom to choose simplicity over complexity when possible determine long-term success more than any specific technical choice.