P-value in Hypothesis testing
The table given below shows the importance of p-value and shows the various kinds of errors that occur during hypothesis testing.
Truth /Decision | Accept h0 | Reject h0 |
h0 -> true | Correct decision based | Type I error (α) |
h0 -> false | Type II error (β) | Incorrect decision based |
Type I error: Incorrect rejection of the null hypothesis. It is denoted by α (significance level).
Type II error: Incorrect acceptance of the null hypothesis. It is denoted by β (power level)
Let’s consider an example to illustrate the process of calculating a p-value for Two Sample T-Test:
A researcher wants to investigate whether there is a significant difference in mean height between males and females in a population of university students.
Suppose we have the following data:
- Group 1 (Males): n1 = 30, and s1=5
- Group 2 ( Females): n2=35, and s2 =6
Starting with interpreting the process of calculating p-value
Step 1: Formulate the Null Hypothesis (H0):
H0: There is no significant difference in mean height between males and females.
Step 2: Choose an Alternative Hypothesis (H1):
H1: There is a significant difference in mean height between males and females.
Step 3: Determine the Test Statistic:
The appropriate test statistic for this scenario is the two-sample t-test, which compares the means of two independent groups.
The t-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.
Where,
- is the mean of the first sample
- is the mean of the second sample
- s1 = First sample’s standard deviation
- s2 = Second sample’s standard deviation
- n1 = First sample’s sample size
- n2 = Second sample’s sample size
Therefore,
So, the calculated two-sample t-test statistic (t) is approximately 5.13.
Step 4: Identify the Distribution of the Test Statistic:
The t-distribution is used for the two-sample t-test. The degrees of freedom for the t-distribution are determined by the sample sizes of the two groups.
The t-distribution is a probability distribution with tails that are thicker than those of the normal distribution.
- where, n1 is total number of values for 1st category.
- n2 is total number of values for 2nd category.
So,
The degrees of freedom (63) represent the variability available in the data to estimate the population parameters. In the context of the two-sample t-test, higher degrees of freedom provide a more precise estimate of the population variance, influencing the shape and characteristics of the t-distribution.
The t-distribution is symmetric and bell-shaped, similar to the normal distribution. As the degrees of freedom increase, the t-distribution approaches the shape of the standard normal distribution. Practically, it affects the critical values used to determine statistical significance and confidence intervals.
Step 5: Calculate Critical Value.
To find the critical t-value with a t-statistic of 5.13 and 63 degrees of freedom, we can either consult a t-table or use statistical software.
We can use scipy.stats
module in Python to find the critical t-value using below code.
Python3
import scipy.stats as stats t_statistic = 5.13 degrees_of_freedom = 63 alpha = 0.05 critical_t_value = stats.t.ppf( 1 - alpha / 2 , degrees_of_freedom) print (f "Critical t-value at alpha={alpha} , df:{degrees_of_freedom} and {critical_t_value}" ) |
Output:
Critical t-value at alpha=0.05 , df:63 and 1.9983405417721956
Comparing with T-Statistic:
Since,
The larger t-statistic suggests that the observed difference between the sample means is unlikely to have occurred by random chance alone. Therefore, we reject the null hypothesis.
P-Value: Comprehensive Guide to Understand, Apply, and Interpret
A p-value is a statistical metric used to assess a hypothesis by comparing it with observed data.
This article delves into the concept of p-value, its calculation, interpretation, and significance. It also explores the factors that influence p-value and highlights its limitations.
Table of Content
- What is P-value?
- How P-value is calculated?
- How to interpret p-value?
- P-value in Hypothesis testing
- Implementing P-value in Python
- Applications of p-value