HTML tutorial
CSS3 tutorial
Bootstrap tutorial
JavaScript tutorial
JQuery tutorial
AngularJS tutorial
React tutorial
NodeJS tutorial
PHP tutorial
Python tutorial
Python3 tutorial
Django tutorial
Linux tutorial
Docker tutorial
Ruby tutorial
Java tutorial
C tutorial
C ++ tutorial
Perl tutorial
JSP tutorial
Lua tutorial
Scala tutorial
Go tutorial
ASP.NET tutorial
C # tutorial
Standard deviation is the most commonly used measure of variation, which describes how spread out the data is
Standard deviation (σ) measures how far a 'typical' observation is from the average of the data (μ).
Standard deviation is important for many statistical methods.
Here is a histogram of the age of all 934 Nobel Prize winners up to the year 2020, showing standard deviations:
Each dotted line in the histogram shows a shift of one extra standard deviation.
If the data is normally distributed:
Note: A normal distribution has a "bell" shape and spreads out equally on both sides.
You can calculate the standard deviation for both the population and the sample.
The formulas are almost the same and uses different symbols to refer to the standard deviation (\(\sigma\)) and sample standard deviation (\(s\)).
Calculating the standard deviation (\(\sigma\)) is done with this formula:
\(\displaystyle \sigma = \sqrt{\frac{\sum (x_{i}-\mu)^2}{n}}\)
Calculating the sample standard deviation (\(s\)) is done with this formula:
\(\displaystyle s = \sqrt{\frac{\sum (x_{i}-\bar{x})^2}{n-1}}\)
\(n\) is the total number of observations.
\(\sum \) is the symbol for adding together a list of numbers.
\(x_{i}\) is the list of values in the data: \(x_{1}, x_{2}, x_{3}, \ldots \)
\(\mu\) is the population mean and \(\bar{x}\) is the sample mean (average value).
\( (x_{i} - \mu ) \) and \( (x_{i} - \bar{x} ) \) are the differences between the values of the observations (\(x_{i}\)) and the mean.
Each difference is squared and added together.
Then the sum is divided by \(n\) or (\( n - 1 \)) and then we find the square root.
Using these 4 example values for calculating the population standard deviation:
4, 11, 7, 14
We must first find the mean:
\(\displaystyle \mu = \frac{\sum x_{i}}{n} = \frac{4 + 11 + 7 + 14}{4} = \frac{36}{4} = \underline{9} \)
Then we find the difference between each value and the mean \( (x_{i}- \mu)\):
Each value is then squared, or multiplied with itself \( ( x_{i}- \mu )^2\):
All of the squared differences are then added together \( \sum (x_{i} -\mu )^2\):
\( 25 + 4 + 4 + 25 = 58\)
Then the sum is divided by the total number of observations, \( n \):
\( \displaystyle \frac{58}{4} = 14.5\)
Finally, we take the square root of this number:
\( \sqrt{14.5} \approx \underline{3.81} \)
So, the standard deviation of the example values is roughly: \(3.81 \)
The standard deviation can easily be calculated with many programming languages.
Using software and programming to calculate statistics is more common for bigger sets of data, as calculating by hand becomes difficult.
Population Standard Deviation
With Python use the NumPy library std()
method to find the standard deviation of the values 4,11,7,14:
import numpy
values = [4,11,7,14]
x = numpy.std(values)
print(x)
Use an R formula to find the standard deviation of the values 4,11,7,14:
values <- c(4,7,11,14)
sqrt(mean((values-mean(values))^2))
Sample Standard Deviation
With Python use the NumPy library std()
method to find the sample standard deviation of the values 4,11,7,14:
import numpy
values = [4,11,7,14]
x = numpy.std(values, ddof=1)
print(x)
Use the R sd()
function to find the sample standard deviation of the values 4,11,7,14:
values <- c(4,7,11,14)
sd(values)
Symbol | Description |
---|---|
\( \sigma \) | Population standard deviation. Pronounced 'sigma'. |
\( s \) | Sample standard deviation. |
\( \mu \) | The population mean. Pronounced 'mu'. |
\( \bar{x} \) | The sample mean. Pronounced 'x-bar'. |
\( \sum \) | The summation operator, 'capital sigma'. |
\( x \) | The variable 'x' we are calculating the average for. |
\( i \) | The index 'i' of the variable 'x'. This identifies each observation for a variable. |
\( n \) | The number of observations. |