Plotting graphs
Scatterplot for AGE vs MEDV
R
ggplot (dataset, aes (x = AGE, y = MEDV)) + geom_point (color = "blue" ) + labs (x = "AGE" , y = "MEDV (Median Home Value)" ) + ggtitle ( "Scatterplot of AGE vs. MEDV" ) |
Output:
Here, we have used AGE as x-axis and MEDV as y-axis for plotting points, respective labels as AGE, and MEDV (Median Home Value) with the color of points as blue, title of the plot as Scatterplot of AGE vs MEDV.
From the plot, it is evident that for majority of the houses as the age of the house increases the value of owner-occupied houses decreases for majority of houses, however for some houses the price increases as well (Small proportion).
Plotting a correlation heatmap
To plot a correlation heatmap, first we need to understand what both these terms are:
Correlation
Correlation is a Statistical method/technique to identify and analyse relationship between two variables. Basically, it means that if the change in value of one variable induces a change in value of other variable as well, then both the variables are in some sort of relation and analyses of this relationship is called as Correlation.
Broadly, this Correlation can be categorised in 3 subparts:
- Positive Correlation
- No Correlation
- Negative Correlation
Positive Correlation
When a change in the first variable induces a change in another variable aligned in the same direction, it is said to have a positive correlation. For example, if increasing or decreasing the value in first variable leads to increment or decrement of value in second variable respectively, both of the variable are in positive correlation
No Correlation
As the name says, changing the value in former variable leads to no effect to latter, they are not in any Correlation.
Negative Correlation
When a change in the first variable brings about a change in another variable aligned in the opposite direction, it is said to have a negative correlation. For example, if increasing or decreasing the value in first variable leads to decrement or increment of value in second variable respectively, both of the variable are in positive correlation
Heatmap
Heatmap is a two-dimensional representation of data which contains different values in different shades of colours. Simply, heatmaps use colors to represent data values. Each cell’s color intensity corresponds to the value it represents, making it easier to identify patterns and trends.
Now, a correlation heatmap, is a combination of both these concepts, so in simple words, it is a heatmap that represents different values of correlation in different shades of colour to signify the relationship between variables.
R
#Using ggcorrplot to get correlation between features ggcorrplot ( cor (dataset),hc.order = TRUE , lab = TRUE ) |
Output:
As it is evident from the plot, that red color shows negative correlation, white shows no correlation and green showing positive correlation. Different shades of colours are used to show varied values of correlation, with dark shade of green showing a strong positive correlation and that of red showing strong negative correlation. It can be concluded that the DIS variable and NOX variable has a strong negative correlation and TAX and RAD variable has a strong positive correlation.Multiple linear regression analysis of Boston Housing Dataset using R
Multiple linear regression analysis of Boston Housing Dataset using R
In this article, we are going to perform multiple linear regression analyses on the Boston Housing dataset using the R programming language.