Key Takeaways
- Always validate your requirements. We initially assumed teams needed gap-free IDs and strict global ordering, but after some uncomfortable conversations realized they could live without both. That single shift collapsed a hard distributed coordination problem into something almost embarrassingly simple.
- The best network call is the one you never make. We embedded sequence generation directly into the application as a library, so for ninety-nine percent of requests, getting a sequence ID is just incrementing a number in local memory, without requiring a network hop, service call, or database.
- Design for failures, not just performance. With two tiers of cache, one in the client and one in the server, a DynamoDB outage or service hiccup became invisible to applications; we found that caching saved us from outages far more often than from slowness.
- Backward compatibility is what turns a migration into a one-line change. We matched every parameter the legacy database sequences supported, so teams could swap out the old system without touching application logic and the Orders team migrated twelve services in three weeks because of it.
- Prefer the design you can debug at 3 AM over the one you can admire on a whiteboard. We had consensus protocols and vector clocks available to us, but chose an architecture that fits on a whiteboard and behaves predictably under failure, because at scale, operational clarity is the whole point.
Introduction: The Sequence Problem Nobody Plans For
When you operate a large-scale platform with hundreds of services, database migrations don’t happen in isolation. They ripple across teams, codebases, and assumptions baked in for years, especially around sequences.
Sequences are one of those database features you rarely think about until they're gone. At their core, they are counters, database-managed objects that hand out unique, monotonically increasing numbers on demand.
Every time you insert a row and need a primary key, the database increments the counter and gives you the next value. No collisions and no coordination code exists in your application and no thinking is required. Counters are so reliable and invisible that most engineers only discover how deeply embedded they are when something forces their replacement.
At Coupang, our migration from a relational database to NoSQL hit an unexpected wall right at this point of replacement.
Over one hundred teams depended on database-native sequences for primary keys. Some used them for ordering guarantees. Others relied on them for backward compatibility with downstream systems that expected monotonically increasing identifiers. The sequences themselves weren’t complex, but they were everywhere: Nearly ten thousand distinct counters were spread across the organization.
NoSQL stores like DynamoDB don't offer native sequence support. UUIDs were an option, but they broke ordering assumptions and required changes across multiple services. Snowflake-style IDs introduced operational complexity we didn't want to take on. We needed something simpler.
The goal was clear: Build a drop-in replacement that let teams migrate off the relational database without rewriting their applications.
As part of a company-wide push to deprecate a legacy database vendor in favor of cloud-native infrastructure, we needed to support everything the source database’s sequences offered, starting points, custom increments, and ascending and descending orders, while providing full backward compatibility so teams could migrate at their own pace without breaking existing systems.
As part of the migration, our Orders Team migrated twelve services in three weeks with zero downtime, changing fewer than fifty lines of code. Here's how we built the system that made that possible.
Why Simple Beats Clever for Sequences
Distributed sequence generation sounds like a problem that demands sophisticated coordination. Consensus protocols, vector clocks, and distributed locks – the literature is full of elegant solutions that look great on a whiteboard.
We resisted that path.
Complex systems fail in complex ways. Every layer of coordination adds latency, failure modes, and operational burden, things you really feel at 3 AM when the pager goes off. For sequences, the actual requirements were modest:
- Uniqueness, so that no two callers would ever receive the same value
- Monotonicity so values would increase (or decrease) over time within a sequence
- Availability so the system would tolerate failures gracefully
- Low latency so sequence generation would not be a bottleneck
- Zero network calls in the hot path so that would generate sequences locally without a network round-trip
Notice what’s not on the list: strict global ordering across all consumers, gap-free sequences, or real-time consistency. Most teams didn’t need these properties. The ones who thought they did usually discovered, after a few uncomfortable conversations, that they could live without them.
The network call constraint was critical. Traditional database sequences required a round-trip for every value. At high throughput, that round-trip dominated latency and created a central bottleneck. We wanted sequence generation to feel like incrementing a local variable, because for most requests, that’s exactly what it would be.
This realization shaped our design principles:
- Minimize coordination to avoid distributed locks and consensus where possible
- Tolerate gaps so unused sequences are acceptable
- Push caching to the edges to reduce round-trips with aggressive caching on the server and client
- Keep the architecture legible so that anyone on-call would understand how it works at 3 AM
- Preserve backward compatibility, because existing schemas, APIs, and consumers should not need changes
Existing Approaches and Their Limitations
Before building anything, we evaluated existing solutions. Each had merit, but none fit our constraints.
The most obvious option was to use UUIDs, but dozens of services had BIGINT primary keys. Changing column types would cascade through schemas, APIs, and reporting systems, which would be a migration on top of the migration we were already doing. UUIDs also scatter inserts across B-tree indexes, degrading write performance in high-throughput tables. In addition, several teams relied on ID ordering for pagination, which UUIDs simply can’t provide.
We looked hard at Snowflake IDs. They solve ordering and fit in a BIGINT, which is appealing. But they require managing worker IDs, which is its own coordination problem in auto-scaling environments, and they depend on synchronized clocks. Clock skew causes ordering anomalies or outright collisions. Worse, they are not a drop-in replacement: For a sequence producing 1001, 1002 would suddenly become 1578323451234567890.
A single-coordinator database sequence was the simplest option on paper, but it would create exactly the bottleneck we were trying to escape: one point of failure, latency on every value, and lock contention that gets worse the more you scale.
Timestamp-based approaches had their own issues. Multiple requests in the same millisecond cause collisions, clock skew across distributed hosts makes ordering unreliable, and there is no way to support custom starting points or increments.
None of these options met all our constraints: full source-database parity, no schema changes, sub-millisecond latency, and operational simplicity. So we built a purpose-specific solution that was simple by design, not by accident.
Core Architecture of the Sequence Service
The sequence service became a tier-0 service (our internal term for critical-path infrastructure whose failure halts core business functions, such as orders or payments). Multiple tier-1 services depended on it for their primary keys. It has three layers: DynamoDB as the source of truth, a server-side caching layer, and thick clients that cache locally.

Figure 1: Requests flow down through two cache layers before reaching DynamoDB, which handles less than 0.1% of sequence demand.
The diagram below traces what happens at each layer when a client requests a sequence. Three scenarios play out depending on where the request is served: a cache hit in the client that never leaves the application process, a background refill from the sequence service when the client cache runs low, and a DynamoDB fetch that only triggers when both cache tiers are depleted.

Figure 2: Three scenarios in sequence order. Most requests never leave the application process. DynamoDB is only accessed when both cache tiers are running low.
[Click here to expand image above to full-size]
DynamoDB as Source of Truth
Each sequence is a single DynamoDB item with the counter name as the key and the current position as a numeric value (stored as DynamoDB's Number type, which maps to Long in our application):
| Key (String) | Value (Number/Long) |
orders_seq |
50000000 |
users_seq |
12000000 |
transactions_seq |
890000 |
When the service needs more sequences, it performs an atomic increment using DynamoDB's conditional update:
UpdateExpression: SET #val = #val + :blockSize
ConditionExpression: #val = :expectedValue
If the condition fails, another instance grabs that block first. The service retries with the new value. This gives us coordination-free uniqueness without distributed locks.
Why Bulk Fetch Matters
Fetching one sequence at a time would hammer DynamoDB with requests. Instead, we fetch blocks of five hundred to one thousand sequences per call. A single DynamoDB write provisions enough sequences to serve hundreds of subsequent requests from cache.
This bulk approach reduces DynamoDB costs (resulting in fewer write operations), improves latency (most requests hit the cache), and increases availability (the service survives brief DynamoDB blips).
The tradeoff is gaps. If a server crashes with four hundred unused sequences sitting in its cache, those values are gone forever. For our use cases, that was a price we were happy to pay.
Server-Side Caching Layer
The sequence service maintains an in-memory cache of pre-fetched sequences for each counter. Multiple instances of the service can run simultaneously; each instance holds a different non-overlapping block of sequences allocated from DynamoDB. As a result, values handed out across instances won't be strictly monotonically increasing globally, but they will be unique. Within a single instance, values are always increasing. If your use case requires strict global ordering across all consumers, this design is not the right fit.
We chose in-memory caching deliberately over a shared external cache like Redis or Valkey. An external cache would reintroduce a network hop and a new failure dependency, which is exactly what we were trying to avoid. Since each service instance allocates its own block from DynamoDB atomically, there's no need for instances to share cache state. The trade-off is that if an instance restarts, the unused sequences in its cache are lost as gaps. For our use cases, that was acceptable.
Like the client, the server had a configurable refill limit, the maximum number of sequences to hold in cache for each counter. When a client requests a sequence, the service:
- Checks if sequences are available in cache for that counter
- Atomically increments and returns the next value, if available
- Triggers a background refill from DynamoDB (up to the configured limit), if the cache is low or empty
Block Allocation Strategy
Each counter had configurable cache parameters based on expected traffic:
| Counter | Max Cache Size | Block Size (per DDB fetch) | Typical Traffic |
orders |
2000 | 1000 | High volume |
users |
1000 | 500 | Medium volume |
audit-logs |
5000 | 2000 | Burst traffic |
The max cache size controlled how many sequences the server would hold in memory for that counter. The block size controlled how many sequences to fetch per DynamoDB call. The sliding window determined when to trigger a refill, and the refill would fetch enough to bring the cache back up to the max size.
Larger cache sizes resulted in fewer DynamoDB calls but more wasted sequences if either traffic was overestimated or the server restarted. We tuned these values based on observed patterns and cost tolerance. For clients with lower gap tolerance, we added an optional Berkeley DB (BDB) layer between server memory and DynamoDB, providing local durability without network round-trips.
Sliding Window Rate Calculation
The most operationally important piece of the system isn’t the sequence generation itself; it is knowing when to refill the cache. Get this wrong, and everything else falls apart. Refill too early has poor results such as unnecessary DynamoDB calls and wasted capacity. Refilling too late results in cache misses, increased latency, and potential failures.
We used a sliding window algorithm to continuously estimate the consumption rate and trigger refills at the right time. Critically, refills happened asynchronously; the cache didn't wait until it was empty. The sliding window predicted when to start a background refill so that new sequences arrived before the existing cache ran out.
The sliding window runs inside the thick client, a library embedded directly in the application process. No separate process or service call is involved. The client tracks its own sequence consumption rate and uses that to decide when to request a refill from the sequence service, before the local cache runs dry.

Figure 3: Sliding window predicting refill before exhaustion – keeping user requests out of the critical path.
[Click here to expand image above to full-size]
This async approach resulted in user requests almost never being blocked on either network calls or DynamoDB writes. The refill completed in the background while the cache continued serving requests from its remaining buffer.
For each counter, the service tracks sequence allocations over a rolling sixty second window, calculating the current consumption rate. The refill threshold is dynamic:
refill_threshold = current_rate × buffer_seconds
When cached sequences drop below this threshold, the service initiates a background refill.
Adaptive Behavior
The sliding window naturally adapts to traffic patterns, with respect to traffic ramp-up and decline, as well as burst handling. Traffic ramp-up occurs as the consumption rate increases and the refill threshold rises. In reaction, the service fetches new blocks earlier, maintaining the buffer. The reciprocal traffic declines as consumption slows and the threshold drops. The service reacts by aggressively stopping pre-fetching, which reduces unnecessary DynamoDB calls. In terms of burst handling, when short bursts temporarily spike the rate, the threshold increases, triggering a refill. If the burst subsides, the threshold gradually returns to normal.
Tuning Parameters
| Parameter | Purpose | Our Default |
window_size |
How much history to consider | 60 seconds |
sample_interval |
How often to record consumption | 1 second |
buffer_seconds |
How far ahead to maintain a buffer | 10 seconds |
min_threshold |
Floor for low-traffic counters | 50 sequences |
The min_threshold prevents low-traffic counters from setting refill thresholds too low. A counter serving one request per minute shouldn't wait until no sequences remain.
A pseudocode implementation is shown below.
// Every second: shift window, update total
// Refill when remaining < (rate × buffer_seconds)
public synchronized int calculateRefillThreshold() {
double ratePerSecond = (double) totalInWindow / windowSizeSeconds;
int dynamicThreshold = (int) (ratePerSecond * bufferSeconds);
return Math.max(dynamicThreshold, minThreshold);
}
Thick Client Design
Caching on the server reduces DynamoDB load. Caching on the client eliminates network calls entirely for the vast majority of requests.
This caching solution was our primary design goal: A sequence generation call should almost never leave the application process. Network calls are expensive, not just in latency, but also in failure modes, connection pool management, and operational complexity. Database-native sequences forced a round trip for every single value. We wanted the opposite.
The thick client is an SDK that applications embed directly. It maintains its own local cache of sequences and communicates with the sequence service only when that cache needs refilling, which, with proper sliding window tuning, happens infrequently.
Why Push Caching to the Client
For high-throughput services, even fast network calls add up. A service processing ten thousand orders per second can't afford a network round-trip for each sequence. The thick client entirely eliminates this bottleneck.
To be clear, we had already implemented caching for the legacy database’s sequences, too. Teams weren’t making a database call for every single value: That approach would have been unusable. But the existing caching had limitations. It was inconsistent across teams, often homegrown, and tied to connection pool behavior. Different services implemented it differently, with varying block sizes, refill strategies, and failure handling.
The new system standardized and optimized this caching into a purpose-built, two-tier architecture.
Client SDK Responsibilities
The thick client handles several issues. Local cache management stores a block of sequences in memory. Background refills fetch new blocks before cache exhaustion. Rate calculation runs its own sliding window to predict refill timing. Failure handling provides degradation when service is unavailable. Finally, monotonicity within the process guarantees local ordering.
Note what the client doesn't handle: configuration. The client knows nothing about increments, starting points, or direction. It simply requests blocks by sequence name and hands out values in the order received. All the complexity lives server-side.
The server suggests an initial fill rate to clients, but the client's sliding window continuously observes actual consumption and adjusts automatically. Within minutes of deployment, even a badly misconfigured client converges to an optimal fill rate for its actual workload.
Core Client Logic
public synchronized long next() {
rateCalculator.recordAllocations(1);
int remaining = cachedValues.length - cachePosition;
int threshold = rateCalculator.calculateRefillThreshold();
// Trigger async refill before cache exhaustion
if (remaining <= threshold && !refillInProgress) {
triggerBackgroundRefill();
}
if (cachePosition >= cachedValues.length) {
// Cache exhausted, must block on refill
blockingRefill();
}
// Values already have an increment applied by the server
return cachedValues[cachePosition++];
}
Client vs. Server Parameters
The asymmetry was intentional; Server blocks request five hundred to one thousand sequences per DynamoDB fetch, while the client cache limit is fifty to five hundred sequences, depending on the application. Client-side memory is more constrained than server memory. Wasted sequences from client crashes are more common (i.e., applications restart frequently). The server absorbs demand from multiple clients, so larger blocks make sense there. Smaller client caches create less waste when applications scale down or redeploy.
Fault Tolerance and Failure Modes
The two-tier caching architecture wasn't just about performance; it also provided layered protection against outages.
Client cache protects against service outages: If the sequence service became unavailable, clients continued serving sequences from their local cache. A client with five hundred sequences cached and consuming ten per second could survive a fifty second service outage without any impact to the application.
Server cache protects against DynamoDB outages: If DynamoDB experienced a partition outage or throttling, the sequence service continued serving from its in-memory cache. With thousands of sequences cached per counter, the server could sustain traffic for minutes.
Using both client and server caching combines protection: The two tiers multiply resilience. A DynamoDB outage didn't immediately affect clients, because the server cache absorbed the impact. A sequence service outage didn't immediately affect applications, because the client cache absorbed the impact.
This combination of caching increased the system's effective availability to be higher than it would have been with any individual component. Brief outages at any layer were invisible to end users.
Failure Impact Summary
| Failure | Impact |
| Client crash | Gaps in sequence (acceptable) |
| Server restart | Gaps (acceptable) |
| DynamoDB outage | Two-tier cache avoids DynamoDB for a certain period |
Guaranteeing Monotonicity and Uniqueness
Let's be precise about the guarantees the system provides.
Uniqueness: Strong Guarantee
No two callers will ever receive the same sequence value. This guarantee holds globally, across all clients and servers.
How it works:
- DynamoDB's conditional writes ensure each block is allocated exactly once.
- Server caches allocate from non-overlapping blocks.
- Client caches receive non-overlapping ranges from servers.
Even under failure scenarios (e.g., crashes, retries, and network partitions), uniqueness is preserved. The worst case is gaps, not duplicates.
Monotonicity: Per Client Guarantee
Within a single client instance, values are strictly increasing (or decreasing, for decreasing sequences). Value N+1 is always greater than value N from that client's perspective.
Across clients, values generally increase over time but may appear out of order. Client A's value 1050 might be used after Client B's value 1100.
Gaps: Expected Behavior
Gaps are inherent in the design. They occur when a server or client crashes with unused sequences in cache, or when blocks are allocated but traffic doesn't materialize.
For most use cases, gaps don’t matter. Primary keys don’t need to be contiguous. Audit logs don’t need sequential IDs. The only systems that truly need gap-free sequences are those with hard external dependencies on contiguity, which in our experience is far rarer than engineers initially assume.
Cache Efficiency
The combination of two-tier caching and sliding window rate calculation produced remarkable efficiency:
| Metric | Value |
| Sequences served from client cache | ~99% |
| Sequences served from the server cache | ~0.9% |
| Sequences requiring DynamoDB call | ~0.1% |
For a system serving fifty thousand sequences per second at peak, this approach resulted in approximately 49,500 served instantly from client memory, approximately four hundred fifty that required a server call (still fast, from server cache), and approximately fifty that triggered a DynamoDB write.
The sliding window's async refill was critical here. Because refills triggered before cache exhaustion, nearly no user requests waited on network calls. Users experienced consistent sub-millisecond latency regardless of what was happening behind the scenes.
Throughput
Throughput metrics included a peak of over fifty thousand sequences per second across all counters with a typical throughput of ten thousand to twenty thousand sequences per second. The per-counter maximum was approximately five thousand sequences per second for counters with the highest traffic. Most of this throughput never hit the sequence service. Actual service traffic was roughly one percent of sequence consumption.
DynamoDB Patterns
| Metric | Value |
| Write TPS (sustained) | 10-20 |
| Write TPS (peak) | 50-100 |
| Read capacity | Near zero (metadata only) |
| Items | One per sequence (~10,000 sequences across 100+ teams) |
| Table size | A few MB |
Yes, really. A system serving fifty thousand sequences per second generated only ten to twenty DynamoDB writes per second under normal load. That thousand-to-one ratio still surprises people when I mention it; the ratio came from bulk fetching plus predictive refills working in concert.
Latency
| Path | p50 | p99 |
| Client cache hit | <0.01ms | <0.05ms |
| Server cache hit | 1-2ms | 5ms |
| DynamoDB fetch | 5-10ms | 20ms |
Under normal operation, over ninety-nine percent of requests hit the client cache. Server calls were rare; DynamoDB calls were rarer.
Costs
The costs were reasonable. DynamoDB with minimal write capacity and tiny storage was about fifty dollars per month. Sequence service compute for a small fleet with low CPU use was around five hundred dollars. Network costs were negligible; thick clients reduce cross-service traffic. The total cost was less than one thousand dollars a month and served tens of thousands of sequences per second.
Migration: Helping Teams Adopt
Building the system was half the work. Getting over a hundred teams to actually adopt it – that was the other half of the effort, and honestly, the harder part. The Orders team migrated twelve services in three weeks, with a peak load of eight thousand sequences per second and zero downtime.
API Compatibility
The client API was simpler than the legacy database’s API:
// Legacy database (before)
long id = connection.getNextSequenceValue("orders_seq");
// New system (after)
long id = sequenceClient.next("orders_seq");
For most teams, migration was a one-line change plus a dependency update. No configuration, no setup, and no parameters were required, just the sequence name. All the complexity (increment, direction, and block sizes) was handled on the server-side during onboarding.
Counter Registration and Configuration
Sequence configuration was entirely server-side, stored in a dynamic config store. During onboarding, teams registered their sequences with all the source database-compatible parameters: sequence name, starting value, increment size, minimum value, maximum value, and direction (ascending or descending).
Using the dynamic config store allowed us to adjust parameters, such as block sizes, increments, and cache limits, without redeploying the service or touching client code. When a high-traffic sequence needed larger blocks during peak season, we updated the config, and the change took effect within seconds. This operational flexibility proved invaluable as we onboarded more teams and learned their actual traffic patterns.
This separation was deliberate. Configuration belonged to the platform team, not individual applications. When a sequence needed tuning (e.g., larger blocks for higher traffic or different increments for a new use case) we made server-side changes without touching client code.
Lessons Learned
What Worked Well
Two-Tier Caching
The client cache reduced server load by ninety-nine percent. The server cache reduced DynamoDB load by another ninety percent or more. Combined, we achieved a thousand-to-one ratio between sequences served and database writes.
Outage Isolation
The caching tiers provided layered fault tolerance. Client caches absorbed service outages. Server caches absorbed DynamoDB outages. Brief failures at any layer were invisible to applications; the system's effective availability exceeded any individual component.
Async Refills
Triggering refills before cache exhaustion resulted in user requests never waiting on network calls under normal operation. This was the difference between "fast most of the time" and "consistently fast", which, as our SRE team will tell you, are very different things.
Self-Correcting Fill Rates
The system auto-tuned within minutes; no traffic forecasting was needed. Misconfigured clients fixed themselves.
Simplicity
The design fits on a whiteboard. New engineers understood it in one meeting. On-call never struggled to debug it.
Gaps Acceptance
Convincing teams that gaps were acceptable unlocked the entire simple design. Those conversations were harder than the engineering; engineers have strong intuitions about "correctness" that sometimes require gentle challenging.
What We Would Reconsider
Block Size Tuning
Initially, we made block sizes too large, wasting sequences on low-traffic counters. Dynamic block sizing based on observed traffic would have been better.
Monitoring Granularity
We monitored aggregate metrics well, but per-counter visibility came later. Should have built detailed dashboards from the beginning. We eventually added real-time dashboards showing cache depth, refill rate, and gap frequency per counter, which was critical for on-call triage.
What This Teaches Us About Distributed Systems
This system illustrates a broader pattern I keep running into: In distributed systems, caching isn’t just a performance hack, it’s a resilience and simplicity primitive. The key insight here wasn’t technical. Rather, it was recognizing that most teams didn’t actually need the guarantees they thought they needed. Once we helped them drop those constraints, the solution became almost embarrassingly simple.
Conclusion
Database-native sequences are a convenience that becomes a constraint at scale. Even with caching, which most teams had implemented in some form, the lack of standardization and predictive refills supplies inconsistent performance and operational burden.
Our architecture, DynamoDB as source of truth, two-tier caching, and sliding window refills, achieved full parity with the legacy database’s sequence capabilities (e.g., starting points, custom increments, and ascending/descending order) while enabling more than one hundred teams to migrate without rewrites. Under normal load, ninety-nine percent of sequence generation happens entirely in application memory: no network, no remote service, and no database, just incrementing a number.
In the end, the best distributed system was the one that made distribution invisible.
If you’re planning a similar migration, start by asking what guarantees you actually need rather than what guarantees feel right. The answer might unlock a simpler design than you expected.