What is Chi-Square test?
The chi-square test is a statistical test used to determine if there is a significant association between two categorical variables. It is a non-parametric test, meaning it makes no assumptions about the distribution of the data. The test is based on the comparison of observed and expected frequencies within a contingency table. The chi-square test helps with feature selection problems by looking at the relationship between the elements. It determines if the association between two categorical variables of the sample would reflect their real association in the population.
It belongs to the family of continuous probability distributions. The Chi-Squared distribution is defined as the sum of the squares of the k independent standard random variables given by:
………..eq(1)
where,
- c is degree of freedom
- is the observed frequency in cell
- is the expected frequency in cell , calculated as:
Chi-Square Distribution
The chi-square distribution is a continuous probability distribution that arises in statistics and is associated with the sum of the squares of independent standard normal random variables. It is often denoted as and is parameterized by the degrees of freedom k.
It is widely used in statistical analysis, particularly in hypothesis testing and calculating confidence intervals. It is often used with non-normally distributed data.
Key terms used in Chi-Square test
- Degrees of freedom
- Observed values: Actual data collected
- Expected values: Predicted data based on a theoretical model in chi-square test.
- where, : Totals of row i
- : Totals of column j
- N: Total number of Observations
- Contingency table: A contingency table, also known as a cross-tabulation or two-way table, is a statistical table that displays the distribution of two categorical variables.
Chi-square test in Machine Learning
Chi-Square test is a statistical method crucial for analyzing associations in categorical data. Its applications span various fields, aiding researchers in understanding relationships between factors. This article elucidates Chi-Square types, steps for implementation, and its role in feature selection, exemplified through Python code on the Iris dataset.
Table of Content
- What is Chi-Square test?
- Types of Chi-Square test
- Why do we use the Chi-Square Test?
- Steps to perform Chi-square test
- Chi-square Test for Feature Selection
- Python Implementation of Chi-Square feature selection