Extract Tables With BeautifulSoup in Python
Below are the steps in which we will see how to extract tables with beautiful soup in Python:
Step 1: Import the Library and Define Target URL
Firstly, we need to import modules and then assign the URL.
Python3
# import required modules import bs4 as bs import requests # assign URL |
Step 2: Create Object for Parsing
In this step, we are creating a BeautifulSoup Object for parsing and further executions of extracting the tables.
Python3
# parsing url_link = requests.get(URL) file = bs.BeautifulSoup(url_link.text, "lxml" ) |
Step 3: Locating and Extracting Table Data
In this step, we are finding the table and its rows.
Python3
# find all tables find_table = file .find( 'table' , class_ = 'numpy-table' ) rows = find_table.find_all( 'tr' ) |
Step 4: Extracting Text from Table Cell
Now create a loop to find all the td tags in the table and then print all the table data tags.
Python3
# display tables for i in rows: table_data = i.find_all( 'td' ) data = [j.text for j in table_data] print (data) |
Complete Code
Below is the complete implementation of the above steps. In this code, we’re scraping a specific table (numpy-table
class) from a w3wiki page about Python lists. After locating the table rows, we iterate through each row to extract and print the cell data.
Python3
# import required modules import bs4 as bs import requests # assign URL # parsing url_link = requests.get(URL) file = bs.BeautifulSoup(url_link.text, "lxml" ) # find all tables find_table = file .find( 'table' , class_ = 'numpy-table' ) rows = find_table.find_all( 'tr' ) # display tables for i in rows: table_data = i.find_all( 'td' ) data = [j.text for j in table_data] print (data) |
Output:
Parsing tables and XML with BeautifulSoup
Scraping is a very essential skill that everybody should learn, It helps us to scrap data from a website or a file that can be used in another beautiful manner by the programmer. In this article, we will learn how to extract tables with beautiful soup and XML from a file. Here, we will scrap data using the Beautiful Soup Python Module.
Perquisites:
Modules Required
pip install bs4
pip install lxml
pip install request