How to use the DataFrame.sample() method In Python
This method is an extension of the previous method the only thing that we do here is remove the drawback by using the sample() method which can select particular rows from the dataset randomly.
Python3
train_set = df.sample(frac = 0.8 , random_state = 42 ) # Dropping all those indexes from the dataframe that exists in the train_set test_set = df.drop(train_set.index) train_set.shape, test_set.shape |
Output:
((120, 4), (30, 4))
Here, we have used the sample() method present with the DataFrame to get a sample of DataFrame from the original data. In the sample() method, we have passed two arguments, frac is the amount of percentage of the sample we want from the DataFrame. Since in the train set we require 80% of the data, therefore, we have passed frac=0.8 and random_state=42 acts as a seed value which helps in generating the same sample across different calls. Then for the test set, we dropped all those rows from the original dataset that were present in the train set hence we have only 20% of data remaining in the test set.
Pandas – Create Test and Train Samples from DataFrame
We make use of large datasets to make a machine learning or deep learning model. While making one of these models, it is required to split our dataset into train and test sets because we want to train our model on the train set and then observe its performance on the test set. These datasets are loaded inside the Python environment in the form of a DataFrame. In this article, we are going to learn about different ways in which we can create train and test samples from a Pandas DataFrame in Python. For demonstration purposes, we will be using a toy dataset (iris dataset) present in the sklearn.datasets module and load it inside a DataFrame. Firstly we will import all the necessary libraries.