Plotting graphs

Scatterplot for AGE vs MEDV

R




ggplot(dataset, aes(x = AGE, y = MEDV)) +
  geom_point(color = "blue") +
  labs(x = "AGE", y = "MEDV (Median Home Value)") +
  ggtitle("Scatterplot of AGE vs. MEDV")


Output:

Multiple linear regression analysis of Boston Housing Dataset using R

Here, we have used AGE as x-axis and MEDV as y-axis for plotting points, respective labels as AGE, and MEDV (Median Home Value) with the color of points as blue, title of the plot as Scatterplot of AGE vs MEDV.

From the plot, it is evident that for majority of the houses as the age of the house increases the value of owner-occupied houses decreases for majority of houses, however for some houses the price increases as well (Small proportion).

Plotting a correlation heatmap

To plot a correlation heatmap, first we need to understand what both these terms are:

Correlation

Correlation is a Statistical method/technique to identify and analyse relationship between two variables. Basically, it means that if the change in value of one variable induces a change in value of other variable as well, then both the variables are in some sort of relation and analyses of this relationship is called as Correlation.

Broadly, this Correlation can be categorised in 3 subparts:

  1. Positive Correlation
  2. No Correlation
  3. Negative Correlation

Positive Correlation

When a change in the first variable induces a change in another variable aligned in the same direction, it is said to have a positive correlation. For example, if increasing or decreasing the value in first variable leads to increment or decrement of value in second variable respectively, both of the variable are in positive correlation

No Correlation

As the name says, changing the value in former variable leads to no effect to latter, they are not in any Correlation.

Negative Correlation

When a change in the first variable brings about a change in another variable aligned in the opposite direction, it is said to have a negative correlation. For example, if increasing or decreasing the value in first variable leads to decrement or increment of value in second variable respectively, both of the variable are in positive correlation

Heatmap

Heatmap is a two-dimensional representation of data which contains different values in different shades of colours. Simply, heatmaps use colors to represent data values. Each cell’s color intensity corresponds to the value it represents, making it easier to identify patterns and trends.

Now, a correlation heatmap, is a combination of both these concepts, so in simple words, it is a heatmap that represents different values of correlation in different shades of colour to signify the relationship between variables.

R




#Using ggcorrplot to get correlation between features
ggcorrplot(cor(dataset),hc.order = TRUE, lab = TRUE)


Output:

Multiple linear regression analysis of Boston Housing Dataset using R

As it is evident from the plot, that red color shows negative correlation, white shows no correlation and green showing positive correlation. Different shades of colours are used to show varied values of correlation, with dark shade of green showing a strong positive correlation and that of red showing strong negative correlation. It can be concluded that the DIS variable and NOX variable has a strong negative correlation and TAX and RAD variable has a strong positive correlation.Multiple linear regression analysis of Boston Housing Dataset using R

Multiple linear regression analysis of Boston Housing Dataset using R

In this article, we are going to perform multiple linear regression analyses on the Boston Housing dataset using the R programming language.

Similar Reads

What is Multiple Linear Regression?

Multiple Linear Regression is a supervised learning model, which is an extension of simple linear regression, where instead of just one independent variable, we have multiple independent variables that can potentially affect the value of the dependent variable. Similar to Linear regression, this model aims to find a linear equation (or simply, the line of best fit) that describes the relationship between multiple independent variables (also called features) and the dependent variable (also called target)....

The equation for Multiple Linear Regression

The equation is the same as that of Linear Regression but with the addition of multiple independent variables:...

Understanding Boston Housing Dataset

Dataset: Boston Housing Dataset (Kaggle)...

Cleaning the dataset

...

Imputing the dataset with Median

...

Plotting graphs

Let’s first check number of missing values in the dataset....

Building the model

...