Algorithm Comparison

Table of Contents

Quick Matrix

AlgorithmAccuracyBurst BehaviorState CostGood Fit
sliding-windowHighestSmooth, strictHigherlogin, payment, sensitive APIs
fixed-windowLowerBoundary bursts possibleLowvery hot coarse limits
token-bucketMedium-highAllows burstsLowAPI plans, gateway quotas
leaky-bucketMedium-highSmooth outputLowbackend protection, shaping

Sliding Window

Sliding window counts events in the last windowMs milliseconds.

new RateLimiter({
  algorithm: 'sliding-window',
  windowMs: 15 * 60 * 1000,
  max: 5,
});

Use it when fairness matters more than raw throughput.

Good Fit

  • Login, password reset, MFA, payment, refund, coupon, and other abuse-sensitive operations.
  • APIs where a user should never exceed max requests in any rolling windowMs.
  • Multi-tenant systems where fairness is more important than squeezing out the highest possible per-process throughput.
  • Business lock scenarios such as "same user + same route" limits.

Characteristics

DimensionValue
AccuracyHighest among the four algorithms
Boundary burstNo fixed-window boundary burst
Memory costHigher, because each active key keeps recent timestamps
User experienceStrict and predictable
Default fitRecommended default for sensitive user actions

How It Works

For each key, the algorithm keeps request timestamps that are still inside the rolling time window.

Current time: 10:01:00
Window:       60 seconds
Valid range:  10:00:00 - 10:01:00

Stored timestamps:
10:00:05, 10:00:20, 10:00:45, 10:00:55

Count = 4
If max = 5, the next request is allowed.

Expired timestamps are ignored or removed before the new request is evaluated. This is why the algorithm avoids the classic fixed-window boundary problem.

Pros and Cons

ProsCons
Strictest rolling-window fairnessMore per-key state than the other algorithms
Good protection for sensitive operationsSlightly higher CPU and memory work under very high cardinality
Easy to explain to product and security teamsRequires careful storage choice for distributed deployments

Practical Notes

  • Use a route-aware key such as user:${userId}:${route} when different endpoints need independent budgets.
  • For Redis deployments, prefer a store path that supports atomic algorithm operations.
  • For extremely high-cardinality traffic, benchmark against fixed-window, token-bucket, and leaky-bucket before choosing a global default.

Fixed Window

Fixed window groups requests into calendar-like buckets.

new RateLimiter({
  algorithm: 'fixed-window',
  windowMs: 60 * 1000,
  max: 1000,
});

Use it for very hot endpoints where approximate limits are acceptable.

Good Fit

  • High-throughput endpoints where an approximate count is acceptable.
  • Coarse operational protection, such as "up to 10,000 requests per minute per API key".
  • Internal APIs where the cost of an occasional boundary burst is low.
  • Metrics-oriented throttling where compact state is more valuable than exact rolling fairness.

Characteristics

DimensionValue
AccuracyLowest, because the window has hard boundaries
Boundary burstPossible near the edge of two windows
Memory costLow, usually one counter per active key
User experienceCan feel permissive at window boundaries
Default fitGood for hot, coarse limits

How It Works

The algorithm maps a request time to a bucket:

bucket = floor(now / windowMs)
counter key = `${rateLimitKey}:${bucket}`

All requests in the same bucket share one counter. When time moves into the next bucket, the counter starts over.

Window A: 10:00:00 - 10:00:59
Window B: 10:01:00 - 10:01:59

User sends 100 requests at 10:00:59
User sends 100 requests at 10:01:00

In two real seconds, the user may complete 200 requests.

That boundary behavior is the main tradeoff.

Pros and Cons

ProsCons
Very compact stateCan allow up to nearly 2 * max around a boundary
Fast and easy to reason aboutLess fair for sensitive actions
Works well for coarse operational limitsReset timing can be visible to clients

Practical Notes

  • Use it when throughput and state cost matter more than strict fairness.
  • Avoid it as the only protection for login, MFA, payment, or similar sensitive actions.
  • If boundary bursts are unacceptable, use sliding-window instead.

Token Bucket

Token bucket refills quota over time and allows short bursts up to capacity.

new RateLimiter({
  algorithm: 'token-bucket',
  windowMs: 60 * 1000,
  max: 100,
});

Use it for API plans, user quotas, or endpoints where bursts are acceptable.

Good Fit

  • API gateway plans where users can make short bursts but must follow a long-term average.
  • Integrations that naturally send traffic in bursts, such as dashboards refreshing multiple widgets.
  • Product quotas such as "100 requests per minute, with short bursts allowed".
  • Mobile or edge clients that may retry or reconnect in small clusters.

Characteristics

DimensionValue
AccuracyMedium-high for average rate
Burst behaviorAllows bursts up to bucket capacity
Memory costLow, usually token count and last refill time
User experienceFlexible and forgiving
Default fitGood for plan/quota style APIs

How It Works

Each key owns a bucket. The bucket has a capacity and refills over time.

capacity = 100
refillRate = 100 tokens per windowMs

Request cost = 1 token
If tokens >= 1: allow and subtract 1
If tokens < 1: reject until enough tokens are refilled

In flex-rate-limit, max is the default capacity unless capacity is specified, and refillRate can be used to tune refill behavior.

Pros and Cons

ProsCons
Allows useful short burstsNot as strict as sliding window
Keeps compact stateBurst capacity must be chosen carefully
Good fit for API plansMay still overload a fragile backend during synchronized bursts

Practical Notes

  • Set capacity to the largest acceptable immediate burst.
  • Set refillRate to the sustainable rate over windowMs.
  • If a backend needs smooth arrival instead of burst tolerance, use leaky-bucket.

Leaky Bucket

Leaky bucket smooths traffic by draining at a steady rate.

new RateLimiter({
  algorithm: 'leaky-bucket',
  windowMs: 60 * 1000,
  max: 100,
});

Use it when the backend needs a steadier arrival rate.

Good Fit

  • Protecting backend services that degrade when traffic arrives in spikes.
  • Work queues, notification senders, webhook dispatchers, and background processing endpoints.
  • APIs where a client should experience gradual admission rather than burst-friendly admission.
  • Traffic shaping before a slower dependency.

Characteristics

DimensionValue
AccuracyMedium-high for smoothing
Burst behaviorRestricts bursts more than token bucket
Memory costLow, usually water level and last leak time
User experienceSmooth but less burst-friendly
Default fitGood for backend protection and shaping

How It Works

The bucket has a current water level. Requests add water. Time drains water at a stable rate.

capacity = 100
leakRate = 100 per windowMs

Before checking: drain elapsed capacity
If water + cost <= capacity: allow and add cost
If water + cost > capacity: reject

This makes the effective output smoother than token bucket, especially when many clients send traffic at the same time.

Pros and Cons

ProsCons
Smooths traffic to protect dependenciesLess friendly to legitimate bursts
Compact stateCapacity and leak rate need tuning
Good for queue-like workloadsNot a strict rolling count

Practical Notes

  • Use leakRate to match the backend's sustainable processing rate.
  • Keep capacity close to the largest queue depth you are comfortable admitting.
  • Combine with a route-aware key when only specific routes need shaping.

How to Choose

RequirementChoose
Login protectionsliding-window
Highest Memory throughputbenchmark fixed-window, token-bucket, and leaky-bucket
Burst-friendly API plantoken-bucket
Smooth backend protectionleaky-bucket
Shared countersRedisStore or CacheHubStore with Redis

Use this decision path when choosing a default:

Is the endpoint security-sensitive?
  yes -> sliding-window
  no  -> continue

Do users need short legitimate bursts?
  yes -> token-bucket
  no  -> continue

Does the backend require smooth arrival?
  yes -> leaky-bucket
  no  -> continue

Is the endpoint extremely hot and approximate limiting is acceptable?
  yes -> fixed-window
  no  -> sliding-window

Token Bucket vs Leaky Bucket

Both algorithms keep compact state and both model a rate over time, but they optimize for different behavior.

QuestionToken BucketLeaky Bucket
What is stored?Current token count and last refill timeCurrent water level and last leak time
What does it allow?Short bursts up to capacitySmoother request admission
What does it protect?Long-term quota fairnessBackend stability under spikes
Best user experienceFlexible API plan limitsPredictable backend shaping
Risk if misconfiguredBurst capacity too highLegitimate bursts rejected too early

Scenario Comparison

API plan:

new RateLimiter({
  algorithm: 'token-bucket',
  windowMs: 60 * 1000,
  max: 100,
  capacity: 150,
  refillRate: 100,
});

Backend smoothing:

new RateLimiter({
  algorithm: 'leaky-bucket',
  windowMs: 60 * 1000,
  max: 100,
  leakRate: 100,
});

If the product promise is "100 requests per minute with short bursts", choose token bucket. If the engineering goal is "do not let this dependency receive a spike", choose leaky bucket.

Login Protection

const loginLimiter = new RateLimiter({
  windowMs: 15 * 60 * 1000,
  max: 5,
  algorithm: 'sliding-window',
  keyGenerator: (req) => `login:${req.ip}:${req.body?.username || 'unknown'}`,
});

Why: login attempts need strict fairness and should not benefit from fixed-window boundaries.

API Gateway Plan

const apiPlanLimiter = new RateLimiter({
  windowMs: 60 * 1000,
  max: 1000,
  algorithm: 'token-bucket',
  capacity: 1200,
  refillRate: 1000,
  keyGenerator: (req) => `api-key:${req.headers['x-api-key'] || req.ip}`,
});

Why: plan users often send small bursts, but the average rate must remain bounded.

Queue or Webhook Protection

const webhookLimiter = new RateLimiter({
  windowMs: 60 * 1000,
  max: 300,
  algorithm: 'leaky-bucket',
  leakRate: 300,
  keyGenerator: (req) => `webhook:${req.user?.id || req.ip}`,
});

Why: downstream workers usually prefer smooth input over bursty input.

Very Hot Public Endpoint

const publicLimiter = new RateLimiter({
  windowMs: 60 * 1000,
  max: 10000,
  algorithm: 'fixed-window',
  keyGenerator: (req) => `public:${req.ip}`,
});

Why: the endpoint is hot, the limit is coarse, and a boundary burst is acceptable.

Performance Dimensions

Performance depends on the storage backend, algorithm, key cardinality, network latency, and the shape of the request stream. Treat the matrix below as a decision aid, not as a substitute for local benchmark data.

AlgorithmState per keyTypical operationAccuracyBurst control
sliding-windowUp to recent request timestampsprune + count + appendHighestStrict
fixed-windowOne counter per active bucketincrement + expireLowerBoundary burst possible
token-buckettoken count + timestamprefill + consumeMedium-highBurst-friendly
leaky-bucketwater level + timestampleak + admitMedium-highSmooth

For reproducible local numbers, run:

npm run benchmark:memory
npm run benchmark:redis
npm run benchmark:http

Then compare:

  • Median and p95 latency, not only QPS.
  • State growth after keys expire.
  • Behavior under high-cardinality keys.
  • Redis command count and network round trips.
  • Whether middleware rollback options are enabled.

FAQ

What is the default algorithm?

sliding-window is the default because it is the strictest and easiest to reason about for user-facing safety.

When should I use fixed window?

Use it for high-throughput, coarse limits where boundary bursts are acceptable. Do not use it as the only protection for login, MFA, payment, or refund operations.

How do I choose between token bucket and leaky bucket?

Choose token-bucket when legitimate bursts should be allowed. Choose leaky-bucket when the backend needs traffic smoothing.

Can I switch algorithms later?

Yes. The public result shape remains the same, but the stored state semantics differ. For production systems, roll out algorithm changes by route or tenant and watch rejection rate, p95 latency, and key cardinality.

Do all algorithms work with all stores?

Yes through the common store contract. Stores with algorithm-specific atomic methods can provide stronger distributed behavior and better performance for that algorithm.