Class For Loading The Dataset Using Pytorch

We will define a custom Python class having a torch.utils.data.Dataset as metaclass for loading the dataset. In this class, we will use the load_words method for reading the CSV file. Also, we will use the get_unique_words method for counting the frequency of unique words in the dataset. The __len__ method will determine the length of the dataset. Whereas the __getitems__ method will create a tensor for each word.

Python3

class TextDataset(torch.utils.data.Dataset): 
    def __init__(self, args): 
        self.args = args 
        self.words = self.load_words() 
        self.unique_words = self.get_unique_words() 
  
        self.index_to_word = {index: word for index,\ 
                              word in enumerate(self.unique_words)} 
        self.word_to_index = {word: index for index, \ 
                              word in enumerate(self.unique_words)} 
  
        self.word_indexes = [self.word_to_index[w] for w in self.words] 
  
    def load_words(self): 
        train_df = pd.read_csv('/content/output.csv') 
        text = train_df['Text'].str.cat(sep=' ') 
        return text.split(' ') 
  
    def get_unique_words(self): 
        word_counts = Counter(self.words) 
        return sorted(word_counts, key=word_counts.get, reverse=True) 
  
    def __len__(self): 
        return len(self.word_indexes) - self.args 
  
    def __getitem__(self, index): 
        return ( 
            torch.tensor(self.word_indexes[index:index + self.args]), 
            torch.tensor(self.word_indexes[index + 1:index + self.args+ 1]) 
        ) 

Sentence Autocomplete Using Pytorch

Natural Language Processing(NLP) is one of the most flourishing parts of deep learning. Several applications of NLP are being used continuously in daily life. In this article, we are going to see how we can use NLP to autocomplete half-written sentences using deep learning methods. We will also see how we can generate clean data for training our NLP model. We will cover the following steps in this article

Cleaning the text data for training the NLP model
Loading the dataset using PyTorch
Creating the LSTM model
Training an NLP model
Making inferences from the trained model

We have seen applications like google keyboard where Google recommends what to type next based on the words which we have already written in the chatbox draft. However, to recommend the next term application like Google has been trained on billions of written sentences. In our model, we will use Wikipedia sentences that are freely available on the internet to download and that we can use for training our model.

Class For Loading The Dataset Using Pytorch

Python3

Sentence Autocomplete Using Pytorch

Categories

Contact US

Class For Loading The Dataset Using Pytorch

Python3

Sentence Autocomplete Using Pytorch

Similar Reads

Categories

Contact US