Measure of Relationship

  • Covariance: Covariance measures the degree to which two variables change together.
    [Tex]Cov(x,y) = \frac{\sum(X_i-\overline{X})(Y_i – \overline{Y})}{n} [/Tex]
  • Correlation: Correlation measures the strength and direction of the linear relationship between two variables. It is represented by correlation coefficient which ranges from -1 to 1. A positive correlation indicates a direct relationship, while a negative correlation implies an inverse relationship. Pearson’s correlation coefficient is given by:
    [Tex]\rho(X, Y) = \frac{cov(X,Y)}{\sigma_X \sigma_Y} [/Tex]

Statistics Cheat Sheet

Statistics is like a toolkit we use to understand and make sense of information. It helps us collect, organize, analyze, and interpret data to find patterns, trends, and relationships in the world around us.

In this Statistics cheat sheet, you will find simplified complex statistical concepts, with clear explanations, practical examples, and essential formulas. This cheat sheet will make things easy when getting ready for an interview or just starting with data science. It explains stuff like mean, median, and hypothesis testing with examples, so you’ll get it in no time. With this cheat sheet, you’ll feel more sure about your stats skills and do great in interviews and real-life data jobs!

Similar Reads

What is Statistics?

Statistics is the branch of mathematics that deals with collecting, analyzing, interpreting, presenting, and organizing data. It involves the study of methods for gathering, summarizing, and interpreting data to make informed decisions and draw meaningful conclusions....

Basics of Statistics

Basic formulas of statistics are,...

What is Data in Statistics?

Data is a collection of observations, it can be in the form of numbers, words, measurements, or statements....

Measure of Central Tendency

Mean: The mean can be calculated by summing all values present in the sample divided by total number of values present in the sample or population.Formula: [Tex]Mean (\mu) = \frac{Sum \, of \, Values}{Number \, of \, Values}   [/Tex].Median: The median is the middle of a dataset when arranged from lowest to highest or highest to lowest in order to find the median, the data must be sorted. For an odd number of data points the median is the middle value and for an even number of data points median is the average of the two middle values.For odd number of data points: [Tex]Median = (\frac{n+1}{2})^{th} [/Tex]For even number of data points: [Tex]Median = Average \, of \, (\frac{n}{2})^{th} value \, and \, its \, next \, value [/Tex]...

Measure of Dispersion

Range: Range is the difference between the maximum and minimum values of the Sample.Variance (σ²): Variance is a measure of how spread-out values from the mean by measuring the dispersion around the Mean.Formula: [Tex]\sigma^2~=~\frac{\Sigma(X-\mu)^2}{n}   [/Tex].Standard Deviation (σ): Standard Deviation is the square root of variance. The measuring unit of S.D. is same as the Sample values’ unit. It indicates the average distance of data points from the mean and is widely used due to its intuitive interpretation.Formula: [Tex]\sigma=\sqrt(\sigma^2)=\sqrt(\frac{\Sigma(X-\mu)^2}{n}) [/Tex]Interquartile Range (IQR): The range between the first quartile (Q1) and the third quartile (Q3). It is less sensitive to extreme values than the range.Formula: [Tex]IQR = Q_3 -Q_1 [/Tex]To compute IQR, calculate the values of the first and third quartile by arranging the data in ascending order. Then, calculate the mean of each half of the dataset.Quartiles: Quartiles divides the dataset into four equal parts:Q1 is the median of the lower 25%Q2 is the median (50%)Q3 is the median of the upper 25% of the dataset.Mean Absolute Deviation: The average of the absolute differences between each data point and the mean. It provides a measure of the average deviation from the mean.Formula: [Tex]Mean \, Absolute \, Deviation = \frac{\sum_{i=1}^{n}{|X – \mu|}}{n} [/Tex]Coefficient of Variation (CV):CV is the ratio of the standard deviation to the mean, expressed as a percentage. It is useful for comparing the relative variability of different datasets.[Tex]CV = (\frac{\sigma}{\mu}) * 100[/Tex]...

Measure of Shape

Kurtosis...

Measure of Relationship

Covariance: Covariance measures the degree to which two variables change together.[Tex]Cov(x,y) = \frac{\sum(X_i-\overline{X})(Y_i – \overline{Y})}{n} [/Tex]Correlation: Correlation measures the strength and direction of the linear relationship between two variables. It is represented by correlation coefficient which ranges from -1 to 1. A positive correlation indicates a direct relationship, while a negative correlation implies an inverse relationship. Pearson’s correlation coefficient is given by:[Tex]\rho(X, Y) = \frac{cov(X,Y)}{\sigma_X \sigma_Y} [/Tex]...

Probability Theory

Here are some basic concepts or terminologies used in probability:...

Probability Distributions Functions

Normal or Gaussian Distribution...

Parameter estimation for Statistical Inference

Population: Population is the group of individual, object or measurements about which you want to draw conclusion.Sample: Sample is the subset of population; the group chosen from the larger population to gather information and make inference about entire population.Expectation: Expectation, in statistics and probability theory, represents the anticipated or average value of a random variable. It is represented by E(x).Parameter: A parameter is a numerical characteristic of a population that is of interest in statistical analysis.Examples of parameters include the population mean (μ), population standard deviation (σ), or the success probability in a binomial distribution.Statistic: A statistic is a numerical value or measure calculated from a sample of data. It is used to estimate or infer properties of the corresponding population.Estimation: Estimation involves using sample data to make inferences or predictions about population parameters.Estimator: An estimator is a statistic used to estimate an unknown parameter in a statistical model.Bias: Bias in parameter estimation refers to the systematic error or deviation of the estimated value from the true value of the parameter.[Tex]Bias(\widehat{\theta}) = E(\widehat{\theta}) – \theta [/Tex]An estimator is considered unbiased if, on average, it produces parameter estimates that are equal to the true parameter value. Bias is measured as the difference between the expected value of the estimator and the true parameter value.[Tex]E(\widehat{\theta}) = \theta [/Tex]...

Hypothesis Testing

Hypothesis testing makes inferences about a population parameter based on sample statistic....

Statistical Tests:

Parametric test are statistical methods that make assumption that the data follows normal distribution....

Non-Parametric Test

Non-parametric test does not make assumptions about the distribution of the data. They are useful when data does not meet the assumptions required for parametric tests....

A/B Testing or Split Testing

A/B testing, also known as split testing, is a method used to compare two versions (A and B) of a webpage, app, or marketing asset to determine which one performs better....

Regression

Regression is a statistical technique used to model the relationship between a dependent variable and one or more independent variables....

Conclusion

In summary, statistics is a vital tool for understanding and utilizing data across various fields. Descriptive statistics simplify and organize data, while inferential statistics allow us to draw conclusions and make predictions based on samples. Measures like central tendency, dispersion, and shape offer insights into data characteristics. Hypothesis testing, confidence intervals, and probability distributions help make informed decisions and analyze relationships between variables. Whether you’re preparing for an interview, exploring data science, or making business choices, a solid grasp of statistics is essential for success in navigating and interpreting the complexities of data....

Statistics Cheat Sheet – FAQs

Is this cheat sheet suitable for Class 10 students?...