Scraping Reddit Subreddits
There are different ways of extracting data from a subreddit. The posts in a subreddit are sorted as hot, new, top, controversial, etc. You can use any sorting method of your choice.
Let’s extract some information from the redditdev subreddit.
Python3
import praw import pandas as pd reddit_read_only = praw.Reddit(client_id = "", # your client id client_secret = "", # your client secret user_agent = "") # your user agent subreddit = reddit_read_only.subreddit( "redditdev" ) # Display the name of the Subreddit print ( "Display Name:" , subreddit.display_name) # Display the title of the Subreddit print ( "Title:" , subreddit.title) # Display the description of the Subreddit print ( "Description:" , subreddit.description) |
Output:
Now let’s extract 5 hot posts from the Python subreddit:
Python3
subreddit = reddit_read_only.subreddit( "Python" ) for post in subreddit.hot(limit = 5 ): print (post.title) print () |
Output:
We will now save the top posts of the python subreddit in a pandas data frame:
Python3
posts = subreddit.top( "month" ) # Scraping the top posts of the current month posts_dict = { "Title" : [], "Post Text" : [], "ID" : [], "Score" : [], "Total Comments" : [], "Post URL" : [] } for post in posts: # Title of each post posts_dict[ "Title" ].append(post.title) # Text inside a post posts_dict[ "Post Text" ].append(post.selftext) # Unique ID of each post posts_dict[ "ID" ].append(post. id ) # The score of a post posts_dict[ "Score" ].append(post.score) # Total number of comments inside the post posts_dict[ "Total Comments" ].append(post.num_comments) # URL of each post posts_dict[ "Post URL" ].append(post.url) # Saving the data in a pandas dataframe top_posts = pd.DataFrame(posts_dict) top_posts |
Output:
Exporting Data to a CSV File:
Python3
import pandas as pd top_posts.to_csv( "Top Posts.csv" , index = True ) |
Output:
Scraping Reddit using Python
In this article, we are going to see how to scrape Reddit using Python, here we will be using python’s PRAW (Python Reddit API Wrapper) module to scrape the data. Praw is an acronym Python Reddit API wrapper, it allows Reddit API through Python scripts.