Impute the entire dataset
This can be done by imputing Median value of each column with NA using apply( ) function.
Syntax:
apply(X, MARGIN, FUN, …)
Parameter:
- X – an array, including a matrix
- MARGIN – a vector
- FUN – the function to be applied
Example: Impute the entire dataset
R
# create a adataframe data <- data.frame (marks1 = c ( NA , 22, NA , 49, 75), marks2 = c (81, 14, NA , 61, 12), marks3 = c (78.5, 19.325, NA , 28, 48.002)) # getting median of each column using apply() all_column_median <- apply (data, 2, median, na.rm= TRUE ) # imputing median value with NA for (i in colnames (data)) data[,i][ is.na (data[,i])] <- all_column_median[i] data |
Output:
How to Impute Missing Values in R?
In this article, we will discuss how to impute missing values in R programming language.
In most datasets, there might be missing values either because it wasn’t entered or due to some error. Replacing these missing values with another value is known as Data Imputation. There are several ways of imputation. Common ones include replacing with average, minimum, or maximum value in that column/feature. Different datasets and features will require one type of imputation method. For example, considering a dataset of sales performance of a company, if the feature loss has missing values then it would be more logical to replace a minimum value.