One client sends 10,000 requests per second to your API. Your server maxes out. Every other client gets “503 Service Unavailable.”
One upstream service goes down. Your app keeps calling it. Threads pile up waiting. Your entire app becomes slow for everyone.
These are not hypothetical. They happen in every production system. Rate Limiting and Circuit Breakers are your defense.
Part 1: Foundations (The Mental Model)
Rate Limiting = The Traffic Light
A Rate Limiter is the Traffic Light at a busy intersection.
- It doesn’t stop traffic altogether. It regulates the flow.
- Red light: “You’ve sent too many requests. Wait 60 seconds.” (HTTP 429 Too Many Requests).
- Green light: “You’re within your limit. Pass through.”
Who it protects: Your service from being overwhelmed by any single client. Also protects against DDoS and scraping.
Circuit Breaker = The Fuse Box
A Circuit Breaker protects your app from a failing dependency (external API, database).
Think of an electrical fuse box. If a short circuit occurs (too much current), the fuse trips (breaks the circuit). Power stops flowing instantly. Your house doesn’t burn down.
States:
- CLOSED (Normal): Requests flow through to the dependency.
- OPEN (Tripped): Dependency failed too many times. Stop calling it. Return a fallback immediately.
- HALF-OPEN (Testing): After a timeout, allow one test request through. If it succeeds → CLOSED. If it fails → OPEN again.
| |
Part 2: The Investigation (Rate Limiting Algorithms)
1. Fixed Window
The simplest. Allow N requests per time window (e.g., 100/minute).
- Problem: A client sends 100 requests at 12:59. Then 100 more at 13:00. They effectively hit you with 200 requests in 2 seconds right at the boundary.
2. Sliding Window Log
Track every request timestamp. Count how many are within the last 60 seconds.
- Pro: Accurate. No boundary burst problem.
- Con: Memory intensive. Need to store every timestamp.
3. Token Bucket (The Best for APIs)
A bucket fills with tokens at a steady rate (e.g., 10 tokens/second, max 100). Each request consumes 1 token. If the bucket is empty: 429.
- Pro: Allows short bursts (up to bucket max). Smooth long-term rate.
- Con: Slightly more complex.
| |
Part 3: The Diagnosis (Circuit Breaker States)
| |
Monitoring Circuit Breaker State
| State | What’s happening | Action |
|---|---|---|
| CLOSED (Normal) | Everything works | Monitor error rate |
| OPEN (Triggered) | Dependency is failing | Alert on-call. Serve fallback. |
| HALF-OPEN | Recovery in progress | Watch the test request closely |
Part 4: The Resolution (Practical Patterns)
1. Retry with Exponential Backoff
Don’t retry immediately. Wait, then wait longer.
| |
Why Jitter? Without jitter, all failed clients retry at the exact same moment → thundering herd → the recovering service immediately fails again.
2. Bulkhead Pattern
Isolate different parts of your system so one failure doesn’t bleed into another.
| |
Final Mental Model
| |
Resilience Rules:
- Every external call should have a timeout.
- Every retry should have exponential backoff + jitter.
- If a dependency fails repeatedly, open the circuit — fail fast.
- Rate limit by user/IP, not just globally.
