Overview
Requests is a HTTP library for Python that allows you to send HTTP requests easily. Some key features of requests:
Making Requests
Import requests:
import requests
Make a GET request:
response = requests.get('<https://api.example.com/data>')
Make a POST request:
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('<https://api.example.com/data>', data=payload)
Response Content
Get the response content as a string:
html = response.text
Get JSON response content:
data = response.json()
Get binary content:
image = response.content
Status Codes
Check the status code:
if response.status_code == 200:
print('Success!')
elif response.status_code == 404:
print('Not Found.')
Request Headers
View request headers:
headers = response.request.headers
Add custom headers:
headers = {'User-Agent': 'My Script'}
response = requests.get(url, headers=headers)
Query Parameters
Add parameters to URL:
params = {'key1': 'value1', 'key2': 'value2'}
response = requests.get(url, params=params)
POST Data
Send data in request body:
data = {'key': 'value'}
response = requests.post(url, data=data)
Send form-encoded data:
data = {'key1': 'value1', 'key2': 'value2'}
response = requests.post(url, data=data)
File Uploads
Upload file:
files = {'file': open('report.xls', 'rb')}
response = requests.post(url, files=files)
Upload multiple files:
files = {'file1': open('report.xls', 'rb'),
'file2': open('data.json', 'rb')}
response = requests.post(url, files=files)
Timeouts
Set connection timeouts:
requests.get(url, timeout=3.05)
Authentication
Pass HTTP Basic Auth credentials:
response = requests.get(url, auth=('user', 'pass'))
Use OAuth1:
import requests_oauthlib
oauth = requests_oauthlib.OAuth1('client_key', client_secret='secret')
response = requests.get(url, auth=oauth)
Sessions
Create a session to persist parameters across requests:
session = requests.Session()
session.params = {'key': 'value'}
response = session.get('<http://httpbin.org/get>')
Error Handling
Check if a response was successful:
if response.status_code == 200:
# successful request
elif response.status_code == 404:
# handle 404 error
Catch connection errors:
try:
response = requests.get(url, timeout=3)
except requests.exceptions.ConnectionError:
# handle connection error
SSL Verification
Verify SSL certificate to ensure request authenticity:
response = requests.get(url, verify=True)
Suppress SSL warnings for insecure requests:
response = requests.get(url, verify=False)
Proxy Servers
Make requests over a proxy server:
proxies = {
'http': '<http://10.10.1.1:3128>',
'https': '<http://10.10.1.1:1080>'
}
requests.get(url, proxies=proxies)
Advanced Section
More Examples of Different HTTP Request Types
PUT Request:
data = {'key':'value'}
response = requests.put('<https://api.example.com/data>', data=data)
DELETE Request:
response = requests.delete('<https://api.example.com/data/1>')
HEAD Request:
response = requests.head('<http://example.com>')
print(response.headers)
OPTIONS Request:
response = requests.options('<https://api.example.com/data>')
print(response.headers['Allow']) # allowed HTTP methods
Using Sessions for Efficiency
session = requests.Session()
session.auth = ('username', 'password')
response = session.get('<https://api.example.com/user>')
# subsequent requests will use authentication
Handling Cookies
url = '<http://example.com>'
cookies = {'my_cookie': 'cookie_value'}
response = requests.get(url, cookies=cookies)
Streaming Response Content
with requests.get(url, stream=True) as response:
for chunk in response.iter_content(8192):
print(chunk)
Setting Timeouts & Retries
from requests.exceptions import ConnectionError
try:
response = requests.get(url, timeout=3.05)
except ConnectionError as ce:
response = requests.get(url, timeout=5)
Custom SSL Certificate Verification
response = requests.get(url, verify='path/to/cert.pem')
Authentication to APIs
url = '<https://api.example.com/data>'
oauth = OAuth1('client_key', client_secret='secret')
response = requests.get(url, auth=oauth)
Using Proxies
proxies = {'http': '<http://10.10.1.1:3128>'}
response = requests.get(url, proxies=proxies)
Optimizing Performance with Keepalive & Connection Pools
session = requests.Session()
adapter = requests.adapters.HTTPAdapter(pool_connections=100, pool_maxsize=100)
session.mount('http://', adapter)
response = session.get(url) # reused connection
Mocking Out Requests for Testing
import requests_mock
with requests_mock.mock() as m:
m.get('<http://test.com>', text='data')
response = requests.get('<http://test.com>')
print(response.text) # prints 'data'
Exceptions & Error Handling
try:
response = requests.get(url, timeout=3)
except requests.exceptions.Timeout:
# Handle timeout
except requests.exceptions.SSLError:
# Handle SSL error
Debugging Requests with Hooks & Logging
import logging
import http.client as http_client
http_client.HTTPConnection.debuglevel = 1
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
response = requests.get(url)
Multipart File & Data Uploads
files = {'file': open('report.pdf', 'rb')}
data = {'key':'value'}
response = requests.post(url, files=files, data=data)
JSON Techniques
# Serialize data to JSON
data = {'key': 'value'}
json_data = json.dumps(data)
# Decode JSON response
response = requests.get(url)
data = response.json()
# Encode params into JSON
params = {'key': 'value'}
response = requests.get(url, json=params)
Custom User-Agents and Headers
headers = {
'User-Agent': 'My Bot 1.0',
'Authorization': 'Bearer <<token>>'
}
response = requests.get(url, headers=headers)
Response Metadata - Access Headers, Encoding, History
response = requests.get(url)
print(response.headers['Content-Type']) # headers
print(response.encoding) # 'utf-8'
print(response.history) # response history
print(response.url) # final URL
Handling Compression and Encodings
response = requests.get(url)
content = response.content
if response.headers['Content-Encoding'] == 'gzip':
content = gzip.decompress(content)
elif response.headers['Content-Encoding'] == 'deflate':
content = deflate.decompress(content)
Streaming Uploads/Downloads
with requests.post(url, data=open('file.bin', 'rb'), stream=True) as response:
print(response.content) # streamed upload
with requests.get(url, stream=True) as response:
response.raw.decode_content = True
with open('filename', 'wb') as fd:
for chunk in response.iter_content(2048):
fd.write(chunk) # streamed download
Additional HTTP Request Types
PUT Request
Update a resource:
data = {'key': 'new value'}
response = requests.put(url, data=data)
PATCH Request
Partial update of resource:
data = {'name': 'new name'}
response = requests.patch(url, data=data)
HEAD Request
Get headers for resource:
response = requests.head(url)
print(response.headers)
OPTIONS Request
Get allowed HTTP methods:
response = requests.options(url)
print(response.headers['Allow'])
Authentication Methods
Basic Auth
requests.get(url, auth=('user', 'pass'))
Digest Auth
requests.get(url, auth=HTTPDigestAuth('user', 'pass'))
OAuth 1
import requests_oauthlib
oauth = requests_oauthlib.OAuth1(client_key, client_secret)
requests.get(url, auth=oauth)
API Keys
headers = {'X-API-Key': 'abc123'}
requests.get(url, headers=headers)
JSON Web Tokens
headers = {'Authorization': 'Bearer {token}'}
requests.get(url, headers=headers)
Handling Pagination
Extract next page URL
response = requests.get(url)
next_page = response.links['next']['url']
Iterate pages manually
while next_page:
data = response.json()
# do something
next_page = response.links.get('next', {}).get('url')
if next_page:
response = requests.get(next_page)
Automate paging
import requests
def get_pages(url):
response = requests.get(url)
yield response.json()
next_page = response.links.get('next')
while next_page:
response = requests.get(next_page['url'])
yield response.json()
next_page = response.links.get('next')
for page in get_pages(url):
print(page)
Tips and Tricks
Global Timeouts
import requests
requests.defaults(timeout=3)
response = requests.get(url)
Session Objects
session = requests.Session()
session.headers.update({'User-Agent': 'my-app/0.0.1'})
response = session.get(url)
Extract Links
import re
import requests
response = requests.get(url)
for link in re.findall('<a href="(.*?)">', response.text):
print(link)
Custom User-Agent
headers = {'User-Agent': 'My Bot 1.0'}
requests.get(url, headers=headers)
Exceptions & Troubleshooting
Common Exceptions
try:
response = requests.get(url, timeout=3)
except requests.exceptions.Timeout:
# Could not connect in time
except requests.exceptions.ToomanyRedirects:
# Exceeded max redirects
except requests.exceptions.SSLError:
# SSL Certificate issue
Get Failure Reason
try:
response = requests.get(url)
response.raise_for_status()
except requests.exceptions.HTTPError as e:
print(e.response.text) # the failure reason
Debug Failed Requests
import logging
import http.client
http.client.HTTPConnection.debuglevel = 2
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
try:
response = requests.get(url)
except Exception as e:
print(e)
Sending Specialized Data
Binary Data
files = {'file': ('report.docx', open('report.docx', 'rb'), 'application/vnd.openxmlformats-officedocument.wordprocessingml.document')}
response = requests.post(url, files=files)
Custom Encoding
data = json.dumps(payload).encode('utf-8')
response = requests.post(url, data=data, headers={'Content-Type': 'application/json'})
GZip Compressed Data
import gzip
data = gzip.compress(b'input data')
response = requests.post(url, data=data,
headers={'Content-Encoding': 'gzip'})
Efficiency Techniques
Keepalive Connections
session = requests.Session()
session.keep_alive = False # reuse connection
session.get(url1)
session.get(url2)
Connection Pooling
import requests
session = requests.Session()
adapter = requests.adapters.HTTPAdapter(pool_connections=100, pool_maxsize=100)
session.mount('https://', adapter)
session.get(url) # Reuses connection
Mocks and Testing
Mock Request
import requests_mock
with requests_mock.mock() as m:
m.get(url, text='data')
response = requests.get(url)
print(response.text) # data
Response Simulation
m = requests_mock.Mocker()
m.get(url, text='Success')
with m:
response = requests.get(url)
print(response.text)
Integration Testing
import responses # monkeypatch requests
import mymodule
def api_callback(request):
return 200, {}, 'OK'
responses.add_callback(
responses.GET,
'<https://api.example.com/data>',
callback=api_callback
)
# Tests mymodule thatinternally calls requests
result = mymodule.get_data()
3rd Party Libraries
BeautifulSoup Parsing
import requests
from bs4 import BeautifulSoup
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
print(soup.find('h1').text)
Scrapy Integration
import scrapy
import requests
class MySpider(scrapy.Spider):
name = 'myspider'
def start_requests(self):
url = '<http://example.com>'
request = requests.Request('GET', url)
prepared = request.prepare()
yield scrapy.Request(url=prepared.url, callback=self.parse)
Here is a comparison of some popular alternate Python HTTP libraries:
requests - The most popular library. Simple, intuitive API, powerful features, works for most cases. Lacks async support.
urllib - Python's built-in HTTP library. Lower-level, less intuitive, fewer helper methods. Useful for basic HTTP needs.
httpx - Built on requests, adds async support, HTTP/2, connection pooling, timeouts. Modern alternative.
aiohttp - Async HTTP library for use with asyncio. Great for concurrency and parallel requests.
httpie - User-friendly command line HTTP client. Great for testing/debugging APIs. Less features than requests.
scrapy - Specialized web crawling and scraping framework. Great for large scraping projects. Lots of customization.
Comparison
The best choice depends on your specific requirements. Requests is the easiest general purpose library. Httpx for async, aiohttp for advanced concurrency, scrapy for large web scraping projects.