How to use the DataFrame.sample() method In Python

This method is an extension of the previous method the only thing that we do here is remove the drawback by using the sample() method which can select particular rows from the dataset randomly.

Python3




train_set = df.sample(frac=0.8, random_state=42)
  
# Dropping all those indexes from the dataframe that exists in the train_set
test_set = df.drop(train_set.index)
train_set.shape, test_set.shape


Output:

((120, 4), (30, 4))

Here, we have used the sample() method present with the DataFrame to get a sample of DataFrame from the original data. In the sample() method, we have passed two arguments, frac is the amount of percentage of the sample we want from the DataFrame. Since in the train set we require 80% of the data, therefore, we have passed frac=0.8 and random_state=42 acts as a seed value which helps in generating the same sample across different calls. Then for the test set, we dropped all those rows from the original dataset that were present in the train set hence we have only 20% of data remaining in the test set.

Pandas – Create Test and Train Samples from DataFrame

We make use of large datasets to make a machine learning or deep learning model. While making one of these models, it is required to split our dataset into train and test sets because we want to train our model on the train set and then observe its performance on the test set. These datasets are loaded inside the Python environment in the form of a DataFrame. In this article, we are going to learn about different ways in which we can create train and test samples from a Pandas DataFrame in Python. For demonstration purposes, we will be using a toy dataset (iris dataset) present in the sklearn.datasets module and load it inside a DataFrame. Firstly we will import all the necessary libraries. 

Similar Reads

Importing Libraries and Dataset

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code....

Manually splitting the data frame into train and test set

...

Using the DataFrame.sample() method

...

Using the train_test_split() method present in the Sklearn

The approach that we will follow to perform splitting is will consider the first 80% of the rows as the training data and the remaining ones will serve as the testing data....

Using Numpy.random.rand() method

...