Making asynchronous HTTP requests is very common in Python applications. Whether fetching data from an API or submitting forms, we often need to send multiple concurrent requests to maximize performance.
However, many web services enforce rate limits to prevent abuse and protect availability. Exceeding these limits can lead to errors or even get your API access revoked entirely!
In this guide, I'll share some effective strategies for respectfully rate limiting your asynchronous requests in Python. We'll cover:
Common Types of Rate Limiting
There are a few popular patterns for how services rate limit API access:
Understanding how an API rates limits allows us to shape traffic appropriately on the client side.
Using Queues to Control Concurrency
Since most Python async frameworks like
Queues make an excellent built-in tool for controlling concurrency. Here is an example using
import asyncio
import httpx
async def fetch(url):
async with httpx.AsyncClient() as client:
return await client.get(url)
async def worker(queue):
while True:
url = await queue.get()
await fetch(url)
queue.task_done()
async def main():
queue = asyncio.Queue(maxsize=10)
tasks = []
for i in range(100):
queue.put_nowait(f'https://api.example.com/data?id={i}')
for _ in range(3):
tasks.append(asyncio.create_task(worker(queue)))
await queue.join()
for task in tasks:
task.cancel()
asyncio.run(main())
Here we create a queue with a max size of 10, limiting us to only 10 requests in flight globally across all our worker tasks.
We could also use a
Retrying Failed Requests with Exponential Backoff
APIs often use rate limiting to maintain availability under high load. So if we start hitting limits, it's best to back off and retry requests later.
We can use an exponential backoff algorithm that progressively waits longer between retries, reducing pressure on the API:
import asyncio
import random
async def fetch(url):
for tries in range(5):
try:
async with httpx.AsyncClient() as client:
return await client.get(url)
except httpx.HTTPStatusError as error:
seconds = random.expovariate(1) * 2 ** tries
print(f'Error fetching {url}, retrying in {seconds:.2f}s')
await asyncio.sleep(seconds)
raise RuntimeError(f'Failed to fetch {url} after 5 tries')
This retries on any HTTP errors, waiting 2, 4, 8, 16 seconds between progressive attempts using exponential backoff with randomized jitter.
Monitoring Usage to Stay Under Limits
To stay under rate limits and avoid failures, we should monitor how close our application is trending toward request thresholds.
Most APIs provide usage metadata in responses we can track. For example, GitHub's API includes remaining rate limit details in headers:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 56
X-RateLimit-Reset: 1602132167
We can capture and log this data with middleware and raise alerts if we cross certain thresholds, like remaining requests falling under 20% of the limit.
Advanced options like token buckets also allow pre-emptively modeling expected usage to predict limit breaches before they trigger failures.
Carefully rate limiting asynchronous requests helps avoid disruptions from exceeding limits, while still allowing reasonable use. Following these patterns, we can build robust applications that use APIs responsibly.