How to use Requests and BeautifulSoup In Python

The requests module gets the raw HTML data from websites and beautiful soup is used to parse that information clearly to get the exact data we require. Unlike Selenium, there is no browser installation involved and it is even lighter because it directly accesses the web without the help of a browser.

Stepwise implementation:

Step 1: Import module.

Python3




import requests
from bs4 import BeautifulSoup
import time


Step 2: The next thing to do is to get the URL data and then parse the HTML code

Python3




url = 'https://finance.yahoo.com/cryptocurrencies/'
response = requests.get(url)
text = response.text
data = BeautifulSoup(text, 'html.parser')


Step 3: First, we shall get all the headings from the table.

Python3




# since, headings are the first row of the table
headings = data.find_all('tr')[0]
headings_list = []  # list to store all headings
  
for x in headings:
    headings_list.append(x.text)
# since, we require only the first ten columns
headings_list = headings_list[:10]
  
print('Headings are: ')
for column in headings_list:
    print(column)


Output:

Step 4: In the same way, all the values in each row can be obtained

Python3




# since we need only first five coins
for x in range(1, 6):
    table = data.find_all('tr')[x]
    c = table.find_all('td')
      
    for x in c:
        print(x.text, end=' ')
    print('')


Output:

Below is the full implementation:

Python3




import requests
from bs4 import BeautifulSoup
from datetime import datetime
import time
  
while(True):
    now = datetime.now()
      
    # this is just to get the time at the time of
    # web scraping
    current_time = now.strftime("%H:%M:%S")
    print(f'At time : {current_time} IST')
  
    response = requests.get('https://finance.yahoo.com/cryptocurrencies/')
    text = response.text
    html_data = BeautifulSoup(text, 'html.parser')
    headings = html_data.find_all('tr')[0]
    headings_list = []
    for x in headings:
        headings_list.append(x.text)
    headings_list = headings_list[:10]
  
    data = []
  
    for x in range(1, 6):
        row = html_data.find_all('tr')[x]
        column_value = row.find_all('td')
        dict = {}
          
        for i in range(10):
            dict[headings_list[i]] = column_value[i].text
        data.append(dict)
          
    for coin in data:
        print(coin)
        print('')
    time.sleep(600)


Output:

How to Build Web scraping bot in Python

In this article, we are going to see how to build a web scraping bot in Python.

Web Scraping is a process of extracting data from websites. A Bot is a piece of code that will automate our task. Therefore, A web scraping bot is a program that will automatically scrape a website for data, based on our requirements.    

Similar Reads

Module needed

bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal....

Method 1: Using Selenium

We need to install a chrome driver to automate using selenium, our task is to create a bot that will be continuously scraping the google news website and display all the headlines every 10mins....

Method 2: Using Requests and BeautifulSoup

...

Hosting the Bot

...