Skip to main content
RapidDev - Software Development Agency
Claude (Anthropic)

How to Fix "Error: 429 Too Many Requests" in Claude (Anthropic)

Error Output
$ Error: 429 Too Many Requests

The '429 Too Many Requests' error from Claude's API means you have exceeded one of Anthropic's rate limits: requests per minute, input tokens per minute, or output tokens per minute. Implement exponential backoff with jitter, respect the retry-after header, and ramp traffic gradually. All limits are per-organization, not per API key.

Book a free consultation
4.9Clutch rating
600+Happy partners
17+Countries served
190+Team members
Claude (Anthropic)Intermediate10-30 minutesMarch 2026RapidDev Engineering Team
TL;DR

The '429 Too Many Requests' error from Claude's API means you have exceeded one of Anthropic's rate limits: requests per minute, input tokens per minute, or output tokens per minute. Implement exponential backoff with jitter, respect the retry-after header, and ramp traffic gradually. All limits are per-organization, not per API key.

What does "Error: 429 Too Many Requests" mean in Claude?

When the Claude API returns HTTP 429, you have exceeded one of three rate limits: requests per minute (RPM), input tokens per minute (ITPM), or output tokens per minute (OTPM). The response body reads: {"type":"error","error":{"type":"rate_limit_error","message":"Rate limited. Please try again later."}} Anthropic uses a token bucket algorithm, meaning a rate of 60 RPM may be enforced as 1 request per second — short bursts can trigger 429s even when you are under the nominal limit.

A critical detail: all rate limits are per-organization, not per API key. Creating multiple API keys does not increase your limits. Only uncached input tokens count toward the ITPM limit. The 429 error is different from the 529 overloaded error — 429 means you are sending too much traffic, while 529 means Anthropic's servers are at capacity.

A billing-tier variant returns: "Extra usage is required for long context requests." This means your account's tier does not support the context window size you are requesting. Some users on higher plans report seeing "Rate limit reached" with dashboard showing only 0-16% usage, suggesting the enforcement can be inconsistent.

Common causes

Your application is sending more

requests per minute than your organization's RPM limit allows

The total input tokens across

concurrent requests exceeds your input tokens per minute (ITPM) quota

A burst of requests triggered

the token bucket algorithm even though your average rate is within limits

Your account's billing tier does

not support the context window size or model you are requesting

Multiple applications or team members are

sharing the same organization, collectively exceeding the rate limit

Retry logic without proper backoff is

creating a cascade of repeated requests that compounds the rate limiting

How to fix "Error: 429 Too Many Requests" in Claude

The primary fix is implementing exponential backoff with jitter. When you receive a 429, wait before retrying and increase the wait time with each subsequent failure. Add random jitter to prevent thundering herd problems where multiple clients retry at the same time.

Check the retry-after response header, which tells you exactly how many seconds to wait before your next request will be accepted. Respect this value — sending requests before the retry-after window expires will just generate more 429 errors.

The Anthropic SDK handles this automatically with its built-in retry logic (2 retries by default). Increase this with max_retries=5 for better resilience. For high-throughput applications, implement a client-side rate limiter that tracks your RPM and ITPM usage and throttles requests before hitting the API limit.

Ramp traffic gradually when starting a new batch process. Do not send 100 concurrent requests immediately — start with a few and increase over 60 seconds. This avoids triggering the acceleration component of Anthropic's rate limiting.

For workloads that consistently hit rate limits, consider the Message Batches API for non-real-time processing, or contact Anthropic to discuss higher tier limits for your organization.

Before
typescript
import anthropic
client = anthropic.Anthropic()
# No rate limiting, no backoff
for item in large_batch:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": item}]
)
After
typescript
import anthropic
import time
import random
client = anthropic.Anthropic(max_retries=5)
# Client-side rate limiting
MIN_DELAY = 1.0 # seconds between requests
for i, item in enumerate(large_batch):
try:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": item}]
)
except anthropic.RateLimitError as e:
wait = float(e.response.headers.get("retry-after", 30))
jitter = random.uniform(0, wait * 0.1)
print(f"Rate limited. Waiting {wait + jitter:.1f}s")
time.sleep(wait + jitter)
# Retry this item
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": item}]
)
time.sleep(MIN_DELAY) # Throttle between requests

Prevention tips

  • Always check and respect the retry-after response header — it tells you the exact wait time before your next request will be accepted
  • Add random jitter to your backoff delays to prevent multiple clients from retrying at the same instant and triggering another rate limit
  • Use the Message Batches API for non-real-time workloads — batch requests have separate, more generous rate limits
  • Remember that rate limits are per-organization, not per API key — creating more keys does not increase your quota

Still stuck?

Copy one of these prompts to get a personalized, step-by-step explanation.

ChatGPT Prompt

I'm getting 'Error: 429 Too Many Requests' from the Claude/Anthropic API when processing a batch of 500 items. How do I implement proper rate limiting with exponential backoff and jitter to stay within Anthropic's limits?

Claude (Anthropic) Prompt

My Claude API integration hits 429 rate limits when processing batches. Here is my current code: [paste code]. Add exponential backoff with jitter, retry-after header support, and client-side rate limiting to stay within Anthropic's RPM and ITPM limits.

Frequently asked questions

What does "Error: 429 Too Many Requests" mean for Claude API?

It means you have exceeded one of three rate limits: requests per minute (RPM), input tokens per minute (ITPM), or output tokens per minute (OTPM). All limits are per-organization. The error response includes a retry-after header indicating when you can send the next request.

Is the 429 error the same as the 529 overloaded error in Claude?

No. A 429 means your organization is sending too many requests (your fault). A 529 means Anthropic's servers are at capacity (their fault). The 429 can be fixed with rate limiting and backoff. The 529 requires waiting for Anthropic's capacity to recover.

Am I charged for requests that receive a 429 error?

No. Rate-limited requests (429) are not billed because no tokens are processed. However, aggressive retry logic without backoff wastes time and can extend the rate limit window, so implement proper backoff to recover faster.

Will creating multiple API keys increase my rate limits?

No. Rate limits in Claude's API are enforced per-organization, not per API key. All keys under the same organization share the same quota. To get higher limits, you need to upgrade your billing tier or contact Anthropic.

How do I handle rate limits in a production application?

Implement a three-layer strategy: (1) client-side rate limiting to stay below your known limits, (2) exponential backoff with jitter for when 429s occur, and (3) a queue system that buffers requests during rate limit windows. Use the Message Batches API for non-real-time workloads.

Can RapidDev help optimize my Claude API integration for high throughput?

Yes. RapidDev can architect a production-grade integration with client-side rate limiting, queue-based request management, and automatic failover strategies. This is especially important for applications processing large volumes of requests against Claude's rate limits.

Talk to an Expert

Our team has built 600+ apps. Get personalized help with your issue.

Book a free consultation

Need help debugging Claude (Anthropic) errors?

Our experts have built 600+ apps and can solve your issue fast. Book a free consultation — no strings attached.

Book a free consultation

We put the rapid in RapidDev

Need a dedicated strategic tech and growth partner? Discover what RapidDev can do for your business! Book a call with our team to schedule a free, no-obligation consultation. We'll discuss your project and provide a custom quote at no cost.