How to use Requests and BeautifulSoup In Python
The requests module gets the raw HTML data from websites and beautiful soup is used to parse that information clearly to get the exact data we require. Unlike Selenium, there is no browser installation involved and it is even lighter because it directly accesses the web without the help of a browser.
Stepwise implementation:
Step 1: Import module.
Python3
import requests from bs4 import BeautifulSoup import time |
Step 2: The next thing to do is to get the URL data and then parse the HTML code
Python3
url = 'https://finance.yahoo.com/cryptocurrencies/' response = requests.get(url) text = response.text data = BeautifulSoup(text, 'html.parser' ) |
Step 3: First, we shall get all the headings from the table.
Python3
# since, headings are the first row of the table headings = data.find_all( 'tr' )[ 0 ] headings_list = [] # list to store all headings for x in headings: headings_list.append(x.text) # since, we require only the first ten columns headings_list = headings_list[: 10 ] print ( 'Headings are: ' ) for column in headings_list: print (column) |
Output:
Step 4: In the same way, all the values in each row can be obtained
Python3
# since we need only first five coins for x in range ( 1 , 6 ): table = data.find_all( 'tr' )[x] c = table.find_all( 'td' ) for x in c: print (x.text, end = ' ' ) print ('') |
Output:
Below is the full implementation:
Python3
import requests from bs4 import BeautifulSoup from datetime import datetime import time while ( True ): now = datetime.now() # this is just to get the time at the time of # web scraping current_time = now.strftime( "%H:%M:%S" ) print (f 'At time : {current_time} IST' ) response = requests.get( 'https://finance.yahoo.com/cryptocurrencies/' ) text = response.text html_data = BeautifulSoup(text, 'html.parser' ) headings = html_data.find_all( 'tr' )[ 0 ] headings_list = [] for x in headings: headings_list.append(x.text) headings_list = headings_list[: 10 ] data = [] for x in range ( 1 , 6 ): row = html_data.find_all( 'tr' )[x] column_value = row.find_all( 'td' ) dict = {} for i in range ( 10 ): dict [headings_list[i]] = column_value[i].text data.append( dict ) for coin in data: print (coin) print ('') time.sleep( 600 ) |
Output:
How to Build Web scraping bot in Python
In this article, we are going to see how to build a web scraping bot in Python.
Web Scraping is a process of extracting data from websites. A Bot is a piece of code that will automate our task. Therefore, A web scraping bot is a program that will automatically scrape a website for data, based on our requirements.