When fetching data from a URL using Python's urllib module, the response body is returned as bytes. Often, we want to work with this data as text strings instead. Converting between bytes and strings is easy with a few methods.
import urllib.request
response = urllib.request.urlopen("http://example.com")
html_bytes = response.read() # Read response body as bytes
To decode bytes to a string, we need to know the character encoding that was used to encode the bytes. Common encodings are UTF-8, ASCII, and Latin-1.
We can usually find the encoding in the response headers:
encoding = response.headers.get_content_charset() # Get encoding from headers
If there is no encoding specified, UTF-8 is a safe bet.
Once we have the encoding, we can decode the bytes:
html_string = html_bytes.decode(encoding) # Decode bytes
print(html_string)
The
We may also encode strings into bytes:
data = "hello world"
data_bytes = data.encode(encoding) # Encode string to bytes
When posting data to a URL, it often needs to be URL encoded into bytes before sending:
from urllib.parse import quote_plus
data = "hello world"
url_encoded_data = quote_plus(data) # URL encode string
data_bytes = url_encoded_data.encode(encoding) # Encode to bytes
So in Python's urllib, we can easily convert between bytes and strings for request/response bodies using