How to use catools package in R In R Language
The sample.split method in catools package can be used to divide the input dataset into training and testing components respectively. It divides the specified vector into the pre-defined fixed ratio which is given as the second argument of the method.
Syntax: sample.split(vec , SplitRatio = x)
Arguments :
- vec – The vector comprising of the data labels
- SplitRatio – Indicator of the splitting ratio to be used
This method creates a boolean vector with the number of entries equivalent to the vector length specified. The subset of the main input dataset can then be extracted using the sample vector and the subset method in this package. Now the training dataset can be accessed using the following syntax :
Syntax: subset(data-frame, sample == TRUE/FALSE)
Arguments :
- data-frame – The data set to create the sample from
- sample – The rows from the dataset will be accessed wherever the values of the sample vector hold true.
The training and testing datasets can be created using the subset() method respectively.
R
# installing the reqd library library ( "caTools" ) # creating a data frame data_frame = data.frame (col1 = c (1:15), col2 = letters [1:15], col3 = c (0,1,1,1,0,0,0, 0,0,1,1,0,1,1,0)) print ( "Data Frame" ) print (data_frame) # creating a sample diving into the ratio of 60:40 sample <- sample.split (data_frame$col2, SplitRatio = 0.6) print ( "Training Dataset" ) # check if sample is true training_dataset <- subset (data_frame, sample == TRUE ) print (training_dataset) print ( "Testing Dataset" ) # check if sample holds false testing_dataset <- subset (data_frame, sample == FALSE ) print (testing_dataset) |
Output:
Split the Dataset into the Training & Test Set in R
In this article, we are going to see how to Splitting the dataset into the training and test sets using R Programming Language.