The Sharding Pattern: Scaling Data Stores Horizontally

  1. The Library Analogy
  2. The Problem: Single Server Limitations
  3. The Solution: Horizontal Partitioning (Sharding)
  4. Sharding Strategies
  5. Strategy Comparison
  6. Practical Implementation Example
  7. Key Considerations
  8. When to Use Sharding
  9. Benefits Summary
  10. Challenges Summary
  11. References

Imagine a library that has grown so large that a single building can no longer hold all the books. Instead of building one impossibly large structure, you create multiple library branches—each holding books organized by a specific category or range. Patrons know which branch to visit based on what they’re looking for. This is the essence of sharding: dividing data across multiple stores to overcome the limitations of a single server.

The Library Analogy

Just as a library system with multiple branches:

  • Distributes books across locations
  • Allows parallel access by many patrons
  • Reduces crowding at any single location
  • Enables geographic proximity to users

A sharded data store:

  • Distributes data across multiple servers
  • Allows parallel queries and writes
  • Reduces contention on any single database
  • Enables data locality for better performance
Z3JhcGggVEIKICAgIEFbQXBwbGljYXRpb25dIC0tPiBCW1NoYXJkaW5nIExvZ2ljXQogICAgQiAtLT4gQ1tTaGFyZCAxPGJyLz5Vc2VycyBBLUhdCiAgICBCIC0tPiBEW1NoYXJkIDI8YnIvPlVzZXJzIEktUF0KICAgIEIgLS0+IEVbU2hhcmQgMzxici8+VXNlcnMgUS1aXQogICAgCiAgICBzdHlsZSBBIGZpbGw6IzRkYWJmNyxzdHJva2U6IzE5NzFjMgogICAgc3R5bGUgQiBmaWxsOiNmZmQ0M2Isc3Ryb2tlOiNmYWIwMDUKICAgIHN0eWxlIEMgZmlsbDojNTFjZjY2LHN0cm9rZTojMmY5ZTQ0CiAgICBzdHlsZSBEIGZpbGw6IzUxY2Y2NixzdHJva2U6IzJmOWU0NAogICAgc3R5bGUgRSBmaWxsOiM1MWNmNjYsc3Ryb2tlOiMyZjllNDQ=

The Problem: Single Server Limitations

A data store hosted on a single server faces inevitable constraints:

Storage Space Limitations

// As data grows, a single server runs out of space
class UserDatabase {
  constructor() {
    this.storage = new DiskStorage('/data');
    // What happens when we reach 10TB? 100TB? 1PB?
  }
  
  async addUser(user) {
    try {
      await this.storage.write(user.id, user);
    } catch (error) {
      if (error.code === 'ENOSPC') {
        // Disk full - now what?
        throw new Error('Storage capacity exceeded');
      }
    }
  }
}

Computing Resource Constraints

// Single server handling millions of concurrent users
class OrderDatabase {
  async processQuery(query) {
    // CPU maxed out processing queries
    // Memory exhausted caching results
    // Queries start timing out
    const result = await this.executeQuery(query);
    return result;
  }
}

Network Bandwidth Bottlenecks

// All traffic flows through one network interface
class DataStore {
  async handleRequest(request) {
    // Network interface saturated at 10Gbps
    // Requests start getting dropped
    // Response times increase dramatically
    return await this.processRequest(request);
  }
}

Geographic Distribution Challenges

// Users worldwide accessing a single data center
class GlobalApplication {
  async getUserData(userId) {
    // User in Tokyo accessing data in Virginia
    // 200ms latency just for network round trip
    // Compliance issues storing EU data in US
    return await this.database.query({ userId });
  }
}

⚠️ Vertical Scaling Limitations

Temporary Solution: Adding more CPU, memory, or disk to a single server

Physical Limits: Eventually you can't add more resources

Cost Inefficiency: High-end servers become exponentially expensive

Single Point of Failure: One server failure affects all users

The Solution: Horizontal Partitioning (Sharding)

Divide the data store into horizontal partitions called shards. Each shard:

  • Has the same schema
  • Contains a distinct subset of data
  • Runs on a separate storage node
  • Operates independently
Z3JhcGggVEIKICAgIEFbQXBwbGljYXRpb24gTGF5ZXJdIC0tPiBCW1NoYXJkIE1hcC9Sb3V0ZXJdCiAgICBCIC0tPiBDW1NoYXJkIEE8YnIvPk9yZGVycyAwLTk5OV0KICAgIEIgLS0+IERbU2hhcmQgQjxici8+T3JkZXJzIDEwMDAtMTk5OV0KICAgIEIgLS0+IEVbU2hhcmQgQzxici8+T3JkZXJzIDIwMDAtMjk5OV0KICAgIEIgLS0+IEZbU2hhcmQgRDxici8+T3JkZXJzIDMwMDArXQogICAgCiAgICBDIC0tPiBDMVsoRGF0YWJhc2U8YnIvPlNlcnZlciAxKV0KICAgIEQgLS0+IEQxWyhEYXRhYmFzZTxici8+U2VydmVyIDIpXQogICAgRSAtLT4gRTFbKERhdGFiYXNlPGJyLz5TZXJ2ZXIgMyldCiAgICBGIC0tPiBGMVsoRGF0YWJhc2U8YnIvPlNlcnZlciA0KV0KICAgIAogICAgc3R5bGUgQSBmaWxsOiM0ZGFiZjcsc3Ryb2tlOiMxOTcxYzIKICAgIHN0eWxlIEIgZmlsbDojZmZkNDNiLHN0cm9rZTojZmFiMDA1CiAgICBzdHlsZSBDIGZpbGw6IzUxY2Y2NixzdHJva2U6IzJmOWU0NAogICAgc3R5bGUgRCBmaWxsOiM1MWNmNjYsc3Ryb2tlOiMyZjllNDQKICAgIHN0eWxlIEUgZmlsbDojNTFjZjY2LHN0cm9rZTojMmY5ZTQ0CiAgICBzdHlsZSBGIGZpbGw6IzUxY2Y2NixzdHJva2U6IzJmOWU0NA==

Sharding Strategies

1. Lookup Strategy

Use a mapping table to route requests to the appropriate shard:

class LookupShardRouter {
  constructor() {
    // Shard map stored in fast cache or database
    this.shardMap = new Map([
      ['tenant-1', 'shard-a'],
      ['tenant-2', 'shard-a'],
      ['tenant-3', 'shard-b'],
      ['tenant-4', 'shard-c']
    ]);
    
    this.shardConnections = {
      'shard-a': 'db1.neo01.com',
      'shard-b': 'db2.neo01.com',
      'shard-c': 'db3.neo01.com'
    };
  }
  
  getShardForTenant(tenantId) {
    const shardKey = this.shardMap.get(tenantId);
    return this.shardConnections[shardKey];
  }
  
  async queryTenantData(tenantId, query) {
    const shardUrl = this.getShardForTenant(tenantId);
    const connection = await this.connect(shardUrl);
    return await connection.query(query);
  }
}
Z3JhcGggTFIKICAgIEFbUmVxdWVzdDo8YnIvPlRlbmFudC0zXSAtLT4gQltMb29rdXA8YnIvPlNoYXJkIE1hcF0KICAgIEIgLS0+IEN7VGVuYW50LTM8YnIvPuKGkiBTaGFyZCBCfQogICAgQyAtLT4gRFsoU2hhcmQgQjxici8+RGF0YWJhc2UpXQogICAgCiAgICBzdHlsZSBBIGZpbGw6IzRkYWJmNyxzdHJva2U6IzE5NzFjMgogICAgc3R5bGUgQiBmaWxsOiNmZmQ0M2Isc3Ryb2tlOiNmYWIwMDUKICAgIHN0eWxlIEQgZmlsbDojNTFjZjY2LHN0cm9rZTojMmY5ZTQ0

💡 Lookup Strategy Benefits

Flexibility: Easy to rebalance by updating the map

Virtual Shards: Map logical shards to fewer physical servers

Control: Assign high-value tenants to dedicated shards

2. Range Strategy

Group related items together based on sequential shard keys:

class RangeShardRouter {
  constructor() {
    this.shardRanges = [
      { min: '2019-01-01', max: '2019-03-31', shard: 'db-q1-2019.neo01.com' },
      { min: '2019-04-01', max: '2019-06-30', shard: 'db-q2-2019.neo01.com' },
      { min: '2019-07-01', max: '2019-09-30', shard: 'db-q3-2019.neo01.com' },
      { min: '2019-10-01', max: '2019-12-31', shard: 'db-q4-2019.neo01.com' }
    ];
  }
  
  getShardForDate(date) {
    const range = this.shardRanges.find(r => 
      date >= r.min && date <= r.max
    );
    return range ? range.shard : null;
  }
  
  async queryOrdersByDateRange(startDate, endDate) {
    // Efficient: Query only relevant shards
    const relevantShards = this.shardRanges
      .filter(r => r.max >= startDate && r.min <= endDate)
      .map(r => r.shard);
    
    // Parallel queries to multiple shards
    const results = await Promise.all(
      relevantShards.map(shard => 
        this.queryShardByDateRange(shard, startDate, endDate)
      )
    );
    
    return results.flat();
  }
}
Z3JhcGggVEIKICAgIEFbUXVlcnk6PGJyLz5PcmRlcnMgaW4gUTIgMjAxOV0gLS0+IEJbUmFuZ2UgUm91dGVyXQogICAgQiAtLT4gQ1tTaGFyZCBRMjxici8+QXByLUp1biAyMDE5XQogICAgCiAgICBEW1F1ZXJ5Ojxici8+T3JkZXJzIEFwci1KdWwgMjAxOV0gLS0+IEIKICAgIEIgLS0+IEMKICAgIEIgLS0+IEVbU2hhcmQgUTM8YnIvPkp1bC1TZXAgMjAxOV0KICAgIAogICAgc3R5bGUgQSBmaWxsOiM0ZGFiZjcsc3Ryb2tlOiMxOTcxYzIKICAgIHN0eWxlIEQgZmlsbDojNGRhYmY3LHN0cm9rZTojMTk3MWMyCiAgICBzdHlsZSBCIGZpbGw6I2ZmZDQzYixzdHJva2U6I2ZhYjAwNQogICAgc3R5bGUgQyBmaWxsOiM1MWNmNjYsc3Ryb2tlOiMyZjllNDQKICAgIHN0eWxlIEUgZmlsbDojNTFjZjY2LHN0cm9rZTojMmY5ZTQ0

💡 Range Strategy Benefits

Range Queries: Efficiently retrieve sequential data

Natural Ordering: Data stored in logical order

Time-Based Archival: Easy to archive old shards

⚠️ Range Strategy Risks

Hotspots: Recent data often accessed more frequently

Uneven Distribution: Some ranges may grow larger than others

3. Hash Strategy

Distribute data evenly using a hash function:

class HashShardRouter {
  constructor() {
    this.shards = [
      'db-shard-0.neo01.com',
      'db-shard-1.neo01.com',
      'db-shard-2.neo01.com',
      'db-shard-3.neo01.com'
    ];
  }
  
  hashUserId(userId) {
    // Simple hash function (use better hash in production)
    let hash = 0;
    for (let i = 0; i < userId.length; i++) {
      hash = ((hash << 5) - hash) + userId.charCodeAt(i);
      hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash);
  }
  
  getShardForUser(userId) {
    const hash = this.hashUserId(userId);
    const shardIndex = hash % this.shards.length;
    return this.shards[shardIndex];
  }
  
  async getUserData(userId) {
    const shard = this.getShardForUser(userId);
    const connection = await this.connect(shard);
    return await connection.query({ userId });
  }
}

// Example distribution
const router = new HashShardRouter();
console.log(router.getShardForUser('user-123')); // db-shard-2
console.log(router.getShardForUser('user-124')); // db-shard-0
console.log(router.getShardForUser('user-125')); // db-shard-3
// Users distributed across shards
Z3JhcGggVEIKICAgIEFbVXNlciBJRHNdIC0tPiBCW0hhc2ggRnVuY3Rpb25dCiAgICBCIC0tPiBDW3VzZXItNTUg4oaSIEhhc2g6IDJdCiAgICBCIC0tPiBEW3VzZXItNTYg4oaSIEhhc2g6IDBdCiAgICBCIC0tPiBFW3VzZXItNTcg4oaSIEhhc2g6IDFdCiAgICAKICAgIEMgLS0+IEZbKFNoYXJkIDIpXQogICAgRCAtLT4gR1soU2hhcmQgMCldCiAgICBFIC0tPiBIWyhTaGFyZCAxKV0KICAgIAogICAgc3R5bGUgQSBmaWxsOiM0ZGFiZjcsc3Ryb2tlOiMxOTcxYzIKICAgIHN0eWxlIEIgZmlsbDojZmZkNDNiLHN0cm9rZTojZmFiMDA1CiAgICBzdHlsZSBGIGZpbGw6IzUxY2Y2NixzdHJva2U6IzJmOWU0NAogICAgc3R5bGUgRyBmaWxsOiM1MWNmNjYsc3Ryb2tlOiMyZjllNDQKICAgIHN0eWxlIEggZmlsbDojNTFjZjY2LHN0cm9rZTojMmY5ZTQ0

💡 Hash Strategy Benefits

Even Distribution: Prevents hotspots

No Lookup Table: Direct computation of shard location

Scalable: Works well with many shards

⚠️ Hash Strategy Challenges

Range Queries: Difficult to query ranges efficiently

Rebalancing: Adding shards requires rehashing data

Strategy Comparison

Practical Implementation Example

Here’s a complete sharding implementation for an e-commerce platform:

class ShardedOrderDatabase {
  constructor() {
    // Use hash strategy for even distribution
    this.shards = [
      { id: 0, connection: 'orders-db-0.neo01.com' },
      { id: 1, connection: 'orders-db-1.neo01.com' },
      { id: 2, connection: 'orders-db-2.neo01.com' },
      { id: 3, connection: 'orders-db-3.neo01.com' }
    ];
  }
  
  getShardForOrder(orderId) {
    // Extract numeric part from order ID
    const numericId = parseInt(orderId.replace(/\D/g, ''));
    const shardIndex = numericId % this.shards.length;
    return this.shards[shardIndex];
  }
  
  async createOrder(order) {
    const shard = this.getShardForOrder(order.id);
    const connection = await this.connectToShard(shard);
    
    try {
      await connection.query(
        'INSERT INTO orders (id, user_id, total, items) VALUES (?, ?, ?, ?)',
        [order.id, order.userId, order.total, JSON.stringify(order.items)]
      );
      return { success: true, shard: shard.id };
    } catch (error) {
      console.error(`Failed to create order on shard ${shard.id}:`, error);
      throw error;
    }
  }
  
  async getOrder(orderId) {
    const shard = this.getShardForOrder(orderId);
    const connection = await this.connectToShard(shard);
    
    const result = await connection.query(
      'SELECT * FROM orders WHERE id = ?',
      [orderId]
    );
    
    return result[0];
  }
  
  async getUserOrders(userId) {
    // User orders spread across shards - need fan-out query
    const results = await Promise.all(
      this.shards.map(async (shard) => {
        const connection = await this.connectToShard(shard);
        return await connection.query(
          'SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC',
          [userId]
        );
      })
    );
    
    // Merge and sort results from all shards
    return results
      .flat()
      .sort((a, b) => b.created_at - a.created_at);
  }
  
  async connectToShard(shard) {
    // Connection pooling per shard
    if (!this.connections) {
      this.connections = new Map();
    }
    
    if (!this.connections.has(shard.id)) {
      const connection = await createDatabaseConnection(shard.connection);
      this.connections.set(shard.id, connection);
    }
    
    return this.connections.get(shard.id);
  }
}

Key Considerations

1. Choosing the Shard Key

The shard key determines data distribution and query performance:

// Good: Static, evenly distributed
const shardKey = user.id; // UUID, never changes

// Bad: Can change over time
const shardKey = user.email; // User might change email

// Bad: Uneven distribution
const shardKey = user.country; // Some countries have many more users

📝 Shard Key Best Practices

Immutable: Choose keys that never change

High Cardinality: Many unique values for even distribution

Query Aligned: Support your most common query patterns

Avoid Hotspots: Prevent sequential keys if using hash strategy

2. Cross-Shard Queries

Minimize queries that span multiple shards:

class OptimizedShardedDatabase {
  // Good: Single shard query
  async getOrderById(orderId) {
    const shard = this.getShardForOrder(orderId);
    return await this.queryShardById(shard, orderId);
  }
  
  // Acceptable: Fan-out with caching
  async getUserOrderCount(userId) {
    // Cache the result to avoid repeated fan-out queries
    const cached = await this.cache.get(`order_count:${userId}`);
    if (cached) return cached;
    
    const counts = await Promise.all(
      this.shards.map(shard => this.countUserOrders(shard, userId))
    );
    
    const total = counts.reduce((sum, count) => sum + count, 0);
    await this.cache.set(`order_count:${userId}`, total, 300); // 5 min TTL
    return total;
  }
  
  // Better: Denormalize to avoid cross-shard queries
  async getUserOrderCountOptimized(userId) {
    // Store count in user shard
    const userShard = this.getShardForUser(userId);
    return await this.queryUserOrderCount(userShard, userId);
  }
}

3. Rebalancing Shards

Plan for growth and rebalancing:

class RebalancingShardManager {
  async addNewShard(newShardConnection) {
    // 1. Add new shard to configuration
    this.shards.push({
      id: this.shards.length,
      connection: newShardConnection
    });
    
    // 2. Gradually migrate data
    await this.migrateDataToNewShard();
    
    // 3. Update shard map
    await this.updateShardMap();
  }
  
  async migrateDataToNewShard() {
    // Use virtual shards for easier rebalancing
    const virtualShards = 1000; // Many virtual shards
    const physicalShards = this.shards.length;
    
    // Remap virtual shards to physical shards
    for (let i = 0; i < virtualShards; i++) {
      const newPhysicalShard = i % physicalShards;
      await this.remapVirtualShard(i, newPhysicalShard);
    }
  }
}

4. Handling Failures

Implement resilience strategies:

class ResilientShardedDatabase {
  async queryWithRetry(shard, query, maxRetries = 3) {
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
      try {
        return await this.queryShard(shard, query);
      } catch (error) {
        if (attempt === maxRetries) {
          // Try replica if available
          if (shard.replica) {
            return await this.queryShard(shard.replica, query);
          }
          throw error;
        }
        
        // Exponential backoff
        await this.sleep(Math.pow(2, attempt) * 100);
      }
    }
  }
  
  async queryShard(shard, query) {
    const connection = await this.connectToShard(shard);
    return await connection.query(query);
  }
  
  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

When to Use Sharding

✅ Use Sharding When

Massive Scale: Data volume exceeds single server capacity

High Throughput: Need to handle millions of concurrent operations

Geographic Distribution: Users spread across multiple regions

Cost Optimization: Multiple commodity servers cheaper than one high-end server

⚠️ Avoid Sharding When

Small Scale: Data fits comfortably on one server

Complex Joins: Application relies heavily on cross-table joins

Limited Resources: Team lacks expertise to manage distributed systems

Premature Optimization: Vertical scaling still viable

Benefits Summary

  • Scalability: Add more shards as data grows
  • Performance: Parallel processing across shards
  • Cost Efficiency: Use commodity hardware instead of expensive servers
  • Geographic Proximity: Place data close to users
  • Fault Isolation: Failure in one shard doesn’t affect others

Challenges Summary

  • Complexity: More moving parts to manage
  • Cross-Shard Queries: Expensive fan-out operations
  • Rebalancing: Difficult to redistribute data
  • Referential Integrity: Hard to maintain across shards
  • Operational Overhead: Monitoring, backup, and maintenance multiply

References

Share