Imputing the dataset with Median

R




for (variable in colnames(dataset))
  {
 dataset[[variable]][is.na(dataset[[variable]])] <- median(dataset[[variable]], 
                                                            na.rm = TRUE)
}
  
new_missing_values <- sum(is.na(dataset))
cat("Missing values after imputation: ", new_missing_values)


Output:

Missing values after imputation: 0
  • Here, we first traversed through all the columns of dataset, then we imputed the missing values in that column with the median value of that respective feature
  • At last, we are printing the number of N/A values after imputation which has to be 0.

The missing values have been handled successfully, now we can proceed further with the model.

Multiple linear regression analysis of Boston Housing Dataset using R

In this article, we are going to perform multiple linear regression analyses on the Boston Housing dataset using the R programming language.

Similar Reads

What is Multiple Linear Regression?

Multiple Linear Regression is a supervised learning model, which is an extension of simple linear regression, where instead of just one independent variable, we have multiple independent variables that can potentially affect the value of the dependent variable. Similar to Linear regression, this model aims to find a linear equation (or simply, the line of best fit) that describes the relationship between multiple independent variables (also called features) and the dependent variable (also called target)....

The equation for Multiple Linear Regression

The equation is the same as that of Linear Regression but with the addition of multiple independent variables:...

Understanding Boston Housing Dataset

Dataset: Boston Housing Dataset (Kaggle)...

Cleaning the dataset

...

Imputing the dataset with Median

...

Plotting graphs

Let’s first check number of missing values in the dataset....

Building the model

...