Types of Descriptive Statistics
Measures of Central Tendency
It represents the whole set of data by a single value. It gives us the location of the central points. There are three main measures of central tendency:
Mean
It is the sum of observations divided by the total number of observations. It is also defined as average which is the sum divided by count.
where,
- x = Observations
- n = number of terms
Let’s look at an example of how can we find the mean of a data set using Python code implementation.
Python3
import numpy as np # Sample Data arr = [ 5 , 6 , 11 ] # Mean mean = np.mean(arr) print ( "Mean = " , mean) |
Output :
Mean = 7.333333333333333
Mode
It is the value that has the highest frequency in the given data set. The data set may have no mode if the frequency of all data points is the same. Also, we can have more than one mode if we encounter two or more data points having the same frequency.
Python3
from scipy import stats # sample Data arr = [ 1 , 2 , 2 , 3 ] # Mode mode = stats.mode(arr) print ( "Mode = " , mode) |
Output:
Mode = ModeResult(mode=array([2]), count=array([2]))
Median
It is the middle value of the data set. It splits the data into two halves. If the number of elements in the data set is odd then the center element is the median and if it is even then the median would be the average of two central elements.
Python3
import numpy as np # sample Data arr = [ 1 , 2 , 3 , 4 ] # Median median = np.median(arr) print ( "Median = " , median) |
Output:
Median = 2.5
Measure of Variability
Measures of variability are also termed measures of dispersion as it helps to gain insights about the dispersion or the spread of the observations at hand. Some of the measures which are used to calculate the measures of dispersion in the observations of the variables are as follows:
Range
The range describes the difference between the largest and smallest data point in our data set. The bigger the range, the more the spread of data and vice versa.
Range = Largest data value – smallest data value
Python3
import numpy as np # Sample Data arr = [ 1 , 2 , 3 , 4 , 5 ] # Finding Max Maximum = max (arr) # Finding Min Minimum = min (arr) # Difference Of Max and Min Range = Maximum - Minimum print ( "Maximum = {}, Minimum = {} and Range = {}" . format ( Maximum, Minimum, Range )) |
Output:
Maximum = 5, Minimum = 1 and Range = 4
Variance
It is defined as an average squared deviation from the mean. It is calculated by finding the difference between every data point and the average which is also known as the mean, squaring them, adding all of them, and then dividing by the number of data points present in our data set.
where,
- x -> Observation under consideration
- N -> number of terms
- mu -> Mean
Python3
import statistics # sample data arr = [ 1 , 2 , 3 , 4 , 5 ] # variance print ( "Var = " , (statistics.variance(arr))) |
Output:
Var = 2.5
Standard Deviation
It is defined as the square root of the variance. It is calculated by finding the Mean, then subtracting each number from the Mean which is also known as the average, and squaring the result. Adding all the values and then dividing by the no of terms followed by the square root.
where,
- x = Observation under consideration
- N = number of terms
- mu = Mean
Python3
import statistics # sample data arr = [ 1 , 2 , 3 , 4 , 5 ] # Standard Deviation print ( "Std = " , (statistics.stdev(arr))) |
Output:
Std = 1.5811388300841898
Measures of Frequency Distribution
Measures of frequency distribution help us gain valuable insights into the distribution and the characteristics of the dataset. Measures like,
are used to analyze the dataset on the basis of measures of frequency distribution.
Descriptive Statistic
Whenever we deal with some piece of data no matter whether it is small or stored in huge databases statistics is the key that helps us to analyze this data and provide insightful points to understand the whole data without going through each of the data pieces in the complete dataset at hand. In this article, we will learn about Descriptive Statistics and how actually we can use it as a tool to explore the data we have.