Extract Tables With BeautifulSoup in Python

Below are the steps in which we will see how to extract tables with beautiful soup in Python:

Step 1: Import the Library and Define Target URL

Firstly, we need to import modules and then assign the URL.

Python3

# import required modules
import bs4 as bs
import requests
 
# assign URL
URL = 'https://www.w3wiki.org/python-list/'

Step 2: Create Object for Parsing

In this step, we are creating a BeautifulSoup Object for parsing and further executions of extracting the tables.

Python3

# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")

Step 3: Locating and Extracting Table Data

In this step, we are finding the table and its rows.

Python3

# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')

Step 4: Extracting Text from Table Cell

Now create a loop to find all the td tags in the table and then print all the table data tags.

Python3

# display tables
for i in rows:
    table_data = i.find_all('td')
    data = [j.text for j in table_data]
    print(data)

Complete Code

Below is the complete implementation of the above steps. In this code, we’re scraping a specific table (numpy-table class) from a w3wiki page about Python lists. After locating the table rows, we iterate through each row to extract and print the cell data.

Python3

# import required modules
import bs4 as bs
import requests
 
# assign URL
URL = 'https://www.w3wiki.org/python-list/'
 
# parsing
url_link = requests.get(URL)
file = bs.BeautifulSoup(url_link.text, "lxml")
 
# find all tables
find_table = file.find('table', class_='numpy-table')
rows = find_table.find_all('tr')
 
# display tables
for i in rows:
    table_data = i.find_all('td')
    data = [j.text for j in table_data]
    print(data)

Output:

Parsing tables and XML with BeautifulSoup

Scraping is a very essential skill that everybody should learn, It helps us to scrap data from a website or a file that can be used in another beautiful manner by the programmer. In this article, we will learn how to extract tables with beautiful soup and XML from a file. Here, we will scrap data using the Beautiful Soup Python Module.

Perquisites:

Modules Required

bs4: Beautiful Soup is a Python library for pulling data out of HTML and XML files.
lxml: It is a Python library that allows us to handle XML and HTML files.
requests: It allows you to send HTTP/1.1 requests extremely easily.

pip install bs4
pip install lxml
pip install request

Extract Tables With BeautifulSoup in Python

Step 1: Import the Library and Define Target URL

Python3

Step 2: Create Object for Parsing

Python3

Step 3: Locating and Extracting Table Data

Python3

Step 4: Extracting Text from Table Cell

Python3

Complete Code

Python3

Parsing tables and XML with BeautifulSoup

Categories

Contact US

Extract Tables With BeautifulSoup in Python

Step 1: Import the Library and Define Target URL

Python3

Step 2: Create Object for Parsing

Python3

Step 3: Locating and Extracting Table Data

Python3

Step 4: Extracting Text from Table Cell

Python3

Complete Code

Python3

Parsing tables and XML with BeautifulSoup

Similar Reads

Categories

Contact US