Converting text into Vectors
TF-IDF calculates that how relevant a word in a series or corpus is to a text. The meaning increases proportionally to the number of times in the text a word appears but is compensated by the word frequency in the corpus (data-set). We will be implementing this with the code below.
Python3
cv = TfidfVectorizer(max_features = 2500 ) X = cv.fit_transform(data[ 'review' ] ).toarray() |
To Print the X generated
Python3
X |
Output:
array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]])
Flipkart Reviews Sentiment Analysis using Python
This article is based on the analysis of the reviews and ratings user gives on Flipkart to make others aware of their experience and moreover about the quality of the product and brand. So, by analyzing that data we can tell the users a lot about the products and also the ways to enhance the quality of the product.
Today we will be using Machine Learning to analyze that data and make it more efficient to understand and prediction ready.
Our task is to predict whether the review given is positive or negative.
Before starting the code, download the dataset by clicking this link.