Urllib Module
The urllib module in Python is a built-in library that provides functions for working with URLs. It allows you to interact with web pages by fetching URLs (Uniform Resource Locators), opening and reading data from them, and performing other URL-related tasks like encoding and parsing. Urllib is a package that collects several modules for working with URLs, such as:
- urllib.request for opening and reading.
- urllib.parse for parsing URLs
- urllib.error for the exceptions raised
- urllib.robotparser for parsing robot.txt files
If urllib is not present in your environment, execute the below code to install it.
pip install urllib3
Example
Here’s a simple example demonstrating how to use the urllib module to fetch the content of a web page:
- We define the URL of the web page we want to fetch.
- We use urllib.request.urlopen() function to open the URL and obtain a response object.
- We read the content of the response object using the read() method.
- Since the content is returned as bytes, we decode it to a string using the decode() method with ‘utf-8’ encoding.
- Finally, we print the HTML content of the web page.
import urllib.request
# URL of the web page to fetch
url = 'https://www.example.com'
try:
# Open the URL and read its content
response = urllib.request.urlopen(url)
# Read the content of the response
data = response.read()
# Decode the data (if it's in bytes) to a string
html_content = data.decode('utf-8')
# Print the HTML content of the web page
print(html_content)
except Exception as e:
print("Error fetching URL:", e)
Output
Python Web Scraping Tutorial
Web scraping, the process of extracting data from websites, has emerged as a powerful technique to gather information from the vast expanse of the internet. In this tutorial, we’ll explore various Python libraries and modules commonly used for web scraping and delve into why Python 3 is the preferred choice for this task.