Making HTTP Requests With Python

Edited and approved by: Stefan Bradstreet

Language of the internet

‘Manually adding query strings’ or ‘form-encoding POST data’ may sound like gibberish at first, but these are some of the first tasks we spare ourselves of when we use the Python programming language to issue HTTP requests. In fact, Requests is an entire library that is one of the most useful python modules; meant to automate such processes to make the human’s work easier. But what are HTTP Requests and, most importantly, how can you issue them out?

Use the infographic below as a highlight over what we will cover in more detail to really help the information stick. Feel free to share this graphic with your friends!

Be sure to follow us for more great python content and subscribe to the youtube channel for python tutorials and coding problem walk-throughs you suggest.

An Introduction to HTTP and SSL

HTTP, or Hypertext Transfer Protocol, defines a set of protocols that allow communication between clients and servers. For a moment, think of your web browser as a client and an application on another computer hosting a website as a server. HTTP works in request-response cycles to facilitate communication between the browser and the application, which is how your google searches run every day. During one search, your web browser issues the request, and the application’s host machine issues a response in the form of a status code containing the requested content. This is convenient, both for you, and a malicious hacker who may want to steal your data without having to directly interact with either the client or the server. So how can data transfers using the HTTP protocol be made secure for sensitive data? Enter SSL.

SSL, or Secure Sockets Layer, is the standard technology used to keep internet connections secure and for safeguarding sensitive data being transferred between two systems. In the server-client relationship we are going to discuss, SSL uses encryption algorithms to ensure that data in transit is impossible to read, essentially blinding hackers from seeing sensitive data like credit card numbers, account passwords and personal data.

Using Python, one could directly request data from a server using the ‘GET’ method. This will be discussed in greater detail below but first, it would be appropriate to take a quick look at Python Libraries that are useful in making HTTP requests.

Python Libraries for Making HTTP Requests

It’s important to note that there are quite a few libraries available for making HTTP Requests. One could easily name httplib, urllib, and requests as ‘household names’ in the area, with Requests arguably being the best-documented of them all. For our tutorial, though, the libraries of choice will be the urllib and request libraries. To download and install the latter, enter the following command:

pip install requests

Of course, either library can be downloaded directly from the internet also.

HTTP Request & Getting an HTTP Response

Now, depending on the library you use for making HTTP requests in Python, import statements for the required modules slightly differ in your development environment. For the Requests Library, it would be:

import requests

While for the urllib library, your first import is as follows:

import urllib.request

To get data from a specific resource, and thus issue a GET request, one would have to create a response object, using the following syntax:

>>> obj = requests.get( 'https;//example.com/specific_path/' )
//it is this response object that you will query for status codes in the next step

Here, we have named our object ‘obj’, and the response to the above request can be further analyzed. The response object is the programmer’s way of gaining insight into the data received from the client. Before the analysis though, now would be a good time to refer back to SSL for security purposes when sending data to and fro. While issuing a GET request, you can also make an SSL call to verify the validity of the website’s certificates. To do this, pass ‘verify=True’ to your request. You may also want to use the cacert bundle, which is non-standard, and may be applied as follows:

>>> obj = requests.get(
  'https;//example.com/path', verify = True, 
  cert = ['/path_to_my/ca.crt' ]
)

For the urllib Library, as with the Request Library, issuing a GET request requires the creation of a request object that returns response objects for respective URLs requested. This can be implemented as follows:

objURL = urllib.request.Request( ' http://www.example.com ' )

Responses are objects that are file-like, so just like a file, you can call the .read( ) method on a response:

response = urllib.request.urlopen( objURL )
this_page = response.read( )

# execute some code
response.close( ) # good practice to close files

URLs starting with ‘FTP or ‘file’ are also eligible. The information contained in the response of a GET request is called the payload, and it can be viewed in many formats. The payload can be analyzed in one of 3 ways:

1. Getting the status code

The status code is the first bit of information that you can gather from your response object. It informs you about the status of the last request. The syntax for seeing the status code is as follows:

>>> o.status_code

For urllib, you would use the ‘getcode( )’ method as follows:

>>> status_code = response.getcode( )
print( status_code )

A successful request gives you a status code of ‘200’, which translates to ‘OK’. This tells us that the server responded with the requested data. An unsuccessful request gives you the infamous ‘404 – Not Found’ status code, which is typical when the content of a website has been removed or it has been moved to a different URL.

Depending on the nature of your request, connectivities, and access rights, the status codes you return may differ, and while it would be convenient for debugging and troubleshooting to get familiar with at least the most frequently occurring status codes, it is not entirely necessary. In addition to the thorough documentation, Requests can help to evaluate possible success and failure status codes, especially when a response object is used in a conditional expression. In this case, any result between 200 and 400 evaluates true because these responses give at least some workable response. The following piece of code would, therefore, make sense, even though 204 means ‘No Content’ and 304 means ‘Not Modified’:

if obj:
print( ' Success! ' )
else:
print( ' An error has occurred. ' )

The principle here is that Requests classifies success or failure as a function of whether or not the requested server was able to respond. Whether a specific response was given is subject to the specification of the programmer. Where responses based on specific codes are required, you could use the ‘.raise_for_status( )’ syntax. This, when used for exception handling will only raise an exception for requests listed as unsuccessful.

2. Viewing response headers.

Response headers return Python dictionary-like objects that allow you to access the values of the header by key. They also provide a lot of additional information describing server names, versions, a time limit for caching the response, and the case insensitive name with resource types. They can be viewed using the ‘.headers’ syntax, again attached to the response object created earlier, as follows:

>>> obj.headers

To peek into the type of content housed under each header, you can use ‘content-type’ as follows:

>>> obj.headers[ 'Content-Type' ]

Interestingly, ‘Content-Type’ is case insensitive, such that ‘content-type’ and ‘Content-Type’ would return the same thing. Sample output for ‘content-type’ queries could look as follows:

' text/html; charset=UTF-8 '

Using urllib, getting the header and content information is a matter of using:

>>> print obj.info( ).getheader( )

and

>>> print obj.info( ).getheader( 'Content-Type' )

3. Getting Response Content

Content that is retrieved from the server can be decoded and presented in the Unicode form. This HTML text of a page can be obtained by using the ‘.text’ syntax:

>>> obj.text

This can return an entire page of HTML text!

The response’s content can also be seen in bytes, using the ‘.content’ syntax. The syntax can be applied as follows:

>>> obj.content

While data will be presented as raw bytes of the payload, it will often be necessary to represent these as a string using appropriate character encoding such as the aforementioned UTF-8. If the encoding scheme is not specified, requests will try to conclude the encoding based on the response headers. To explicitly return a specific encoding, you should set the desired encoding before using the ‘.text’ syntax, as follows:

>>> obj.encoding = 'utf-8'
>>> obj.text

Passing Arguments to the Request

The returns of the GET request can be customized by passing values in the form of query string parameters to the URL. This is done by passing data to the field ‘params’. Consider the examples below:

obj = requests.get(
'https://api.github.com/search/repositories',
params={ 'x' : 'requests+language:python' },
)

or

params = '?q=wordpress&type=title'
url = 'https://api.github.com/search/repositories'.format( params )

with the urllib Library. This can, of course, be generalized into a more generic function.

Passing the dictionary as a parameter allows you to modify the results returned from the Search API. Customizing the headers you send can also serve as a way of customizing your requests. To do this, you have to pass a dictionary of HTTP headers into your get method using ‘headers’ as a parameter:

obj = requests.get(
  'https://api.github.com/search/repositories',
  params={ 'x' : 'requests+language:python' },
  headers={'Accept': 'application/vnd.github.v3.text-match+json'}
)

Above, you specify to the server the type of content your application can handle.

In a Nutshell…

The information we’ve considered helps you to understand the HTTP and associated SSL protocols as well as their importance. We now know that Python can be used to make HTTP requests and that this is important for the automation of otherwise complicated tasks. Using the requests and urllib libraries, we know and have simple examples of the GET request, how to analyze the payload, as well as how to modify the results of a search or query depending on inputted parameters. Is it simple? The answer is: it gets simpler with practice. So try to type out the pieces of code contained herein for yourself and continue to pour over the documentation. Happy coding!

About Stefan Bradstreet

Stefan is a software development engineer II at Amazon with 5+ years of experience in tech. He is passionate about helping people become better coders and climbing the ranks in their careers as well as his own through continued learning of leadership techniques and software best practices.

Thank you for visiting!

Please remember if you found this content useful and would like to see more python and software development articles from us, help us conquer the google algorithm in any of the ways below:

If you didn’t find what you expected feel free to use the contact page and let us know ways we can improve or what would help you grow as a developer better! I am honored by every view and it takes a village to really make this great.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s