Two-Sample Kolmogorov–Smirnov Test
The two-sample Kolmogorov-Smirnov (KS) test is used to compare two independent samples to assess whether they come from the same distribution. It’s a distribution-free test that evaluates the maximum vertical difference between the empirical distribution functions (EDFs) of the two samples.
Empirical Distribution Function (EDF):
The empirical distribution function at the value ( x ) in each sample represents the proportion of observations less than or equal to ( x ). Mathematically, the EDFs for the two samples are given by:
For Group 1:
For Group 2:
Where,
- and are the sample sizes for the two groups
- and represent individual observations in the respective samples,
- and are the indicator functions.
Kolmogorov–Smirnov Statistic
where,
- sup denotes supremum, representing the largest value over all possible xx values,
- are the empirical cumulative distribution functions (ECDFs) of the two samples, respectively.
- Each ECDF represents the proportion of observations in the corresponding sample that are less than or equal to a particular value of
x
.
Example
Let’s perform the Two-Sample Kolmogorov–Smirnov Test using the scipy.stats.ks_2samp
function. The function calculates the Kolmogorov–Smirnov statistic for two samples to find out if two samples come from different distributions or not.
Kolmogorov-Smirnov Test (KS Test)
The Kolmogorov-Smirnov (KS) test is a non-parametric method for comparing distributions, essential for various applications in diverse fields.
In this article, we will look at the non-parametric test which can be used to determine whether the shape of the two distributions is the same or not.