Calculate the p-value
The p-value for each distance is calculated as the Chi-Square statistic of the Mahalanobis distance with k-1(k = number of variables) degrees.
pchisq() function is used to compute cumulative chi-square density.
Syntax: pchisq(vec, df)
Parameters:
- vec: Vector of x-values
- df: Degree of Freedom
Example: Calculate p-value
R
# create new column for p-value df$pvalue <- pchisq (df$mahalnobis, df=3) df |
Output:
In general, a p-value that is less than 0.001 is considered to be an outlier. In this case, all the p values are greater than 0.001
How to Calculate Mahalanobis Distance in R?
In this article, we are going to calculate Mahalanobis distance in R Programming Language.
Mahalanobis distance is used to calculate the distance between two points or vectors in a multivariate distance metric space which is a statistical analysis involving several variables. To start with we need a dataframe.
Example: Create dataframe
R
set.seed (700) score_1 <− rnorm (20,12,1) score_2 <− rnorm (20,11,12) score_3 <− rnorm (20,15,23) score_4 <− rnorm (20,16,3) df <− data.frame (score_1, score_2, score_3, score_4) df |
Output:
score_1 score_2 score_3 score_4 1 11.91218 20.3843568 68.179655 12.864159 2 11.77103 13.5718323 -30.953642 15.241168 3 11.91570 29.9250800 42.570528 7.179686 4 10.25905 10.7594514 17.879960 19.639647 5 13.01343 15.7463448 3.185857 12.776482 6 11.78211 14.9688992 31.368892 16.043620 7 13.51328 10.5017826 58.985715 14.701817 8 11.10565 20.4965614 6.806652 15.876947 9 11.20834 12.7588547 10.461229 16.991393 10 11.10233 -10.3961351 18.082209 15.258644 11 12.34732 -0.8615359 57.411750 13.400421 12 12.08361 15.0248600 -17.853098 13.999682 13 12.86457 -6.1221908 23.184838 20.389762 14 10.58871 17.1000715 20.900155 12.560962 15 10.74134 6.3728076 39.173259 17.865589 16 11.20248 8.8909128 24.696939 14.384012 17 12.89797 34.8522136 10.035498 14.975053 18 11.37993 14.4232355 28.129197 16.395271 19 11.78309 14.9324201 23.584362 14.765245 20 12.77480 30.7969171 -9.635902 10.203178
mahalanobis() function is used to calculate Mahalanobis distance in R. It is a builtin type.
Syntax: mahalanobis(Data , center, cov)
where:
- Data: matrix or vector of data
- center: mean vector
- cov: covariance matrix
Example: Calculate Mahalanobis distance
R
mahalanobis (df, colMeans (df), cov (df)) |
Output:
4.46866714558536 4.61260586529474 7.41513071619846 5.21448589688871
2.84292222223026 0.673116763926688 6.04984394951585 1.72865361097932
1.03750690527476 7.21856549018804 4.85579110162481 2.90808365141091
7.57223884458172 3.27702692226183 2.68208130355785 0.916110244005359
6.79796970070888 0.829693729587342 0.0356208551487593 4.86388508103035