Two sample T-Test in Python
Let us consider an example, we are given two-sample data, each containing heights of 15 students of a class. We need to check whether two different class students have the same mean height. There are three ways to conduct a two-sample T-Test in Python.
Method 1: Using Scipy library
Scipy stands for scientific python and as the name implies it is a scientific python library and it uses Numpy under the cover. This library provides a variety of functions that can be quite useful in data science. Firstly, let’s create the sample data. Now let’s perform two sample T-Test. For this purpose, we have ttest_ind() function in Python.
Syntax: ttest_ind(data_group1, data_group2, equal_var=True/False)
Here,
- data_group1: First data group
- data_group2: Second data group
- equal_var = “True”: The standard independent two sample t-test will be conducted by taking into consideration the equal population variances.
- equal_var = “False”: The Welch’s t-test will be conducted by not taking into consideration the equal population variances.
Note that by default equal_var is True
Before conducting the two-sample T-Test we need to find if the given data groups have the same variance. If the ratio of the larger data groups to the small data group is less than 4:1 then we can consider that the given data groups have equal variance. To find the variance of a data group, we can use the below syntax,
Syntax: print(np.var(data_group))
Here,
- data_group: The given data group
Python3
# Python program to display variance of data groups # Import library import scipy.stats as stats # Creating data groups data_group1 = np.array([ 14 , 15 , 15 , 16 , 13 , 8 , 14 , 17 , 16 , 14 , 19 , 20 , 21 , 15 , 15 , 16 , 16 , 13 , 14 , 12 ]) data_group2 = np.array([ 15 , 17 , 14 , 17 , 14 , 8 , 12 , 19 , 19 , 14 , 17 , 22 , 24 , 16 , 13 , 16 , 13 , 18 , 15 , 13 ]) # Print the variance of both data groups print (np.var(data_group1), np.var(data_group2)) |
Output:
Here, the ratio is 12.260 / 7.7275 which is less than 4:1.
Performing Two-Sample T-Test
Python3
# Python program to demonstrate how to # perform two sample T-test # Import the library import scipy.stats as stats # Creating data groups data_group1 = np.array([ 14 , 15 , 15 , 16 , 13 , 8 , 14 , 17 , 16 , 14 , 19 , 20 , 21 , 15 , 15 , 16 , 16 , 13 , 14 , 12 ]) data_group2 = np.array([ 15 , 17 , 14 , 17 , 14 , 8 , 12 , 19 , 19 , 14 , 17 , 22 , 24 , 16 , 13 , 16 , 13 , 18 , 15 , 13 ]) # Perform the two sample t-test with equal variances stats.ttest_ind(a = data_group1, b = data_group2, equal_var = True ) |
Output:
Analyzing the result:
Two sample t-test has the following hypothesis:
H0 => µ1 = µ2 (population mean of dataset1 is equal to dataset2)
HA => µ1 ≠µ2 (population mean of dataset1 is different from dataset2)
Here, since the p-value (0.53004) is greater than alpha = 0.05 so we cannot reject the null hypothesis of the test. We do not have sufficient evidence to say that the mean height of students between the two data groups is different.
How to Conduct a Two Sample T-Test in Python
In this article, we are going to see how to conduct a two-sample T-test in Python.
This test has another name as the independent samples t-test. It is basically used to check whether the unknown population means of given pair of groups are equal. tt allows one to test the null hypothesis that the means of two groups are equal