request
An extensible library for opening URLs using a variety of protocols
The simplest way to use this module is to call the urlopen function, which accepts a string containing a URL or a Request object (described below). It opens the URL and returns the results as file-like object; the returned object has some extra methods described below.
The OpenerDirector manages a collection of Handler objects that do all the actual work. Each Handler implements a particular protocol or option. The OpenerDirector is a composite object that invokes the Handlers needed to open the requested URL. For example, the HTTPHandler performs HTTP GET and POST requests and deals with non-error returns. The HTTPRedirectHandler automatically deals with HTTP 301, 302, 303 and 307 redirect errors, and the HTTPDigestAuthHandler deals with digest authentication.
urlopen(url, data=None)
– basic usage is the same as original urllib. pass the url and optionally data to post to an HTTP URL, and get a file-like object back. One difference is that you can also pass a Request instance instead of URL. Raises a URLError (subclass of OSError); for HTTP errors, raises an HTTPError, which can also be treated as a valid response.
build_opener
– function that creates a new OpenerDirector instance. Will install the default handlers. Accepts one or more Handlers as arguments, either instances or Handler classes that it will instantiate. If one of the argument is a subclass of the default handler, the argument will be installed instead of the default.
install_opener
– installs a new opener as the default opener.
objects of interest:
OpenerDirector
– sets up the User Agent as the Python-urllib client and manages the Handler classes, while dealing with requests and responses.
Request
– an object that encapsulates the state of a request. The state can be as simple as the URL. It can also include extra HTTP headers, e.g. a User-Agent.
BaseHandler
– internals: BaseHandler and parent _call_chain conventions.
Example:
# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
authinfo.add_password(realm='PDQ Application',
... uri='https://mahler:8092/site-updates.py',
... user='klem',
... passwd='geheim$parole')
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('https://www.python.org/')
request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
This class is an abstraction of a URL request.
Methods:
get_method - get HTTP method of request
get_full_url - get request URL
set_proxy - set proxy URL
has_proxy - true, if request proxied
add_header - add header ro a header dict
add_unredirected_header - add header, that won't be added to a redirected request
has_header - true if request contains header with header_name
get_header - get value of header
remove_header - remove header from dict
Parameters:
url – should be a string containing a valid URL.
data – must be an object specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data. The supported object types include bytes, file-like objects, and iterables of bytes-like objects. If no Content-Length nor Transfer-Encoding header field has been provided, HTTPHandler will set these headers according to the type of data. Content-Length will be used to send bytes objects, while Transfer-Encoding: chunked as specified in RFC 7230, Section 3.3.1 will be used to send files and other iterables.
headers – should be a dictionary, and will be treated as if add_header() was called with each key and value as arguments. This is often used to “spoof” the User-Agent header value, which is used by a browser to identify itself - some HTTP servers only allow requests coming from common browsers as opposed to scripts.
origin_req_host – should be the request-host of the origin transaction, as defined by RFC 2965. It defaults to http.cookiejar.request_host(self). This is the host name or IP address of the original request that was initiated by the user. For example, if the request is for an image in an HTML document, this should be the request-host of the request for the page containing the image.
unverifiable – should indicate whether the request is unverifiable, as defined by RFC 2965. It defaults to False. An unverifiable request is one whose URL the user did not have the option to approve. For example, if the request is for an image in an HTML document, and the user had no option to approve the automatic fetching of the image, this should be true.
method – should be a string that indicates the HTTP request method that will be used (e.g. ‘HEAD’). If provided, its value is stored in the method attribute and is used by get_method(). The default is ‘GET’ if data is None or ‘POST’ otherwise. Subclasses may indicate a different default method by setting the method attribute in the class itself.
Returns: Request object.
Last updated