Extract Tables With BeautifulSoup in Python

Below are the steps in which we will see how to extract tables with beautiful soup in Python:

Step 1: Import the Library and Define Target URL

Firstly, we need to import modules and then assign the URL.

Python3




# import required modules
import bs4 as bs
import requests
 
# assign URL


Step 2: Create Object for Parsing

In this step, we are creating a BeautifulSoup Object for parsing and further executions of extracting the tables.

Python3




# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")


Step 3: Locating and Extracting Table Data

In this step, we are finding the table and its rows. 

Python3




# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')


Step 4: Extracting Text from Table Cell

Now create a loop to find all the td tags in the table and then print all the table data tags.

Python3




# display tables
for i in rows:
    table_data = i.find_all('td')
    data = [j.text for j in table_data]
    print(data)


Complete Code

Below is the complete implementation of the above steps. In this code, we’re scraping a specific table (numpy-table class) from a w3wiki page about Python lists. After locating the table rows, we iterate through each row to extract and print the cell data.

Python3




# import required modules
import bs4 as bs
import requests
 
# assign URL
 
# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")
 
# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')
 
# display tables
for i in rows:
    table_data = i.find_all('td')
    data = [j.text for j in table_data]
    print(data)


Output:

Parsing tables and XML with BeautifulSoup

Scraping is a very essential skill that everybody should learn, It helps us to scrap data from a website or a file that can be used in another beautiful manner by the programmer. In this article, we will learn how to extract tables with beautiful soup and XML from a file. Here, we will scrap data using the Beautiful Soup Python Module.

Perquisites:  

Modules Required

  • bs4: Beautiful Soup is a Python library for pulling data out of HTML and XML files.
  • lxml: It is a Python library that allows us to handle XML and HTML files.
  • requests: It allows you to send HTTP/1.1 requests extremely easily.
pip install bs4
pip install lxml
pip install request

Similar Reads

Extract Tables With BeautifulSoup in Python

Below are the steps in which we will see how to extract tables with beautiful soup in Python:...

Parsing and Extracting XML files With BeautifulSoup

...