Featured image of post Rate Limiting & Circuit Breaker: The 'Traffic Light & Fuse Box' Mental Model

Rate Limiting & Circuit Breaker: The 'Traffic Light & Fuse Box' Mental Model

How do you stop one bad client from taking down your entire API? A mastery guide to rate limiting strategies, circuit breakers, and resilience patterns.

One client sends 10,000 requests per second to your API. Your server maxes out. Every other client gets “503 Service Unavailable.”

One upstream service goes down. Your app keeps calling it. Threads pile up waiting. Your entire app becomes slow for everyone.

These are not hypothetical. They happen in every production system. Rate Limiting and Circuit Breakers are your defense.


Part 1: Foundations (The Mental Model)

Rate Limiting = The Traffic Light

A Rate Limiter is the Traffic Light at a busy intersection.

  • It doesn’t stop traffic altogether. It regulates the flow.
  • Red light: “You’ve sent too many requests. Wait 60 seconds.” (HTTP 429 Too Many Requests).
  • Green light: “You’re within your limit. Pass through.”

Who it protects: Your service from being overwhelmed by any single client. Also protects against DDoS and scraping.

Circuit Breaker = The Fuse Box

A Circuit Breaker protects your app from a failing dependency (external API, database).

Think of an electrical fuse box. If a short circuit occurs (too much current), the fuse trips (breaks the circuit). Power stops flowing instantly. Your house doesn’t burn down.

States:

  • CLOSED (Normal): Requests flow through to the dependency.
  • OPEN (Tripped): Dependency failed too many times. Stop calling it. Return a fallback immediately.
  • HALF-OPEN (Testing): After a timeout, allow one test request through. If it succeeds → CLOSED. If it fails → OPEN again.
1
2
CLOSED → (too many failures) → OPEN → (timeout) → HALF-OPEN → (success) → CLOSED
                                                              → (failure) → OPEN

Part 2: The Investigation (Rate Limiting Algorithms)

1. Fixed Window

The simplest. Allow N requests per time window (e.g., 100/minute).

  • Problem: A client sends 100 requests at 12:59. Then 100 more at 13:00. They effectively hit you with 200 requests in 2 seconds right at the boundary.

2. Sliding Window Log

Track every request timestamp. Count how many are within the last 60 seconds.

  • Pro: Accurate. No boundary burst problem.
  • Con: Memory intensive. Need to store every timestamp.

3. Token Bucket (The Best for APIs)

A bucket fills with tokens at a steady rate (e.g., 10 tokens/second, max 100). Each request consumes 1 token. If the bucket is empty: 429.

  • Pro: Allows short bursts (up to bucket max). Smooth long-term rate.
  • Con: Slightly more complex.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Token Bucket with Redis (atomic, works across multiple servers)
import redis, time

r = redis.Redis()

def is_allowed(user_id: str, rate: int = 10, burst: int = 100) -> bool:
    now = time.time()
    key = f"bucket:{user_id}"
    
    pipe = r.pipeline()
    pipe.hgetall(key)
    pipe.expire(key, 60)
    result = pipe.execute()
    
    bucket = result[0]
    tokens = float(bucket.get(b"tokens", burst))
    last_refill = float(bucket.get(b"last_refill", now))
    
    # Refill tokens since last request
    elapsed = now - last_refill
    tokens = min(burst, tokens + elapsed * rate)
    
    if tokens < 1:
        return False  # 429
    
    # Consume 1 token
    r.hset(key, mapping={"tokens": tokens - 1, "last_refill": now})
    return True

Part 3: The Diagnosis (Circuit Breaker States)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Using 'circuitbreaker' library
from circuitbreaker import circuit

@circuit(
    failure_threshold=5,    # Open after 5 consecutive failures
    recovery_timeout=30,    # Wait 30s before trying HALF-OPEN
    expected_exception=Exception
)
def call_payment_api(order_id: str) -> dict:
    response = requests.post("https://payment-service/charge", json={"order": order_id})
    response.raise_for_status()
    return response.json()


def process_order(order_id: str):
    try:
        result = call_payment_api(order_id)
    except CircuitBreakerError:
        # Circuit is OPEN — don't even try. Fail fast with fallback.
        return {"status": "pending", "message": "Payment service is down. Will retry."}
    except Exception as e:
        return {"status": "error", "message": str(e)}

Monitoring Circuit Breaker State

StateWhat’s happeningAction
CLOSED (Normal)Everything worksMonitor error rate
OPEN (Triggered)Dependency is failingAlert on-call. Serve fallback.
HALF-OPENRecovery in progressWatch the test request closely

Part 4: The Resolution (Practical Patterns)

1. Retry with Exponential Backoff

Don’t retry immediately. Wait, then wait longer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import time, random

def call_with_retry(fn, max_retries=3):
    for attempt in range(max_retries):
        try:
            return fn()
        except Exception as e:
            if attempt == max_retries - 1:
                raise  # Last attempt: re-raise
            wait = (2 ** attempt) + random.uniform(0, 1)  # Exponential + Jitter
            time.sleep(wait)

Why Jitter? Without jitter, all failed clients retry at the exact same moment → thundering herd → the recovering service immediately fails again.

2. Bulkhead Pattern

Isolate different parts of your system so one failure doesn’t bleed into another.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Without Bulkhead: all threads share one pool
# One slow endpoint starves all other endpoints of threads

# With Bulkhead: dedicated thread pool per critical service
from concurrent.futures import ThreadPoolExecutor

payment_executor = ThreadPoolExecutor(max_workers=5, thread_name_prefix="payment")
notification_executor = ThreadPoolExecutor(max_workers=10, thread_name_prefix="notif")

# Payment can be slow without affecting notifications
future = payment_executor.submit(call_payment_api, order_id)

Final Mental Model

1
2
3
4
5
6
7
Rate Limiter    -> The Traffic Light. Controls inbound flow. Protects YOU from clients.
Circuit Breaker -> The Fuse Box. Controls outbound flow. Protects YOU from dependencies.

Token Bucket    -> Steady flow, allows short bursts.
CLOSED State    -> Everything normal. Let traffic through.
OPEN State      -> Dependency failed. Stop calling. Return cached/fallback.
Exponential Backoff -> "Wait 1s, 2s, 4s, 8s before retrying." Avoids thundering herd.

Resilience Rules:

  1. Every external call should have a timeout.
  2. Every retry should have exponential backoff + jitter.
  3. If a dependency fails repeatedly, open the circuit — fail fast.
  4. Rate limit by user/IP, not just globally.
Made with laziness love 🦥

Subscribe to My Newsletter