Interquartile Range And Quartile Deviation of One Array using SciPy
- We import NumPy and SciPy libraries.
- We define a sample dataset named
data
. - We use SciPy’s
iqr
function to directly calculate the interquartile range (IQR) of the dataset. - We then calculate the quartile deviation by dividing the IQR by 2.
Python3
import numpy as np from scipy.stats import iqr # Sample dataset data = np.array([ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ]) # Calculate Interquartile Range (IQR) using scipy iqr_value = iqr(data) # Calculate Quartile Deviation quartile_deviation = iqr_value / 2 print ( "Interquartile Range (IQR):" , iqr_value) print ( "Quartile Deviation:" , quartile_deviation) |
Output:
Interquartile Range (IQR): 4.5
Quartile Deviation: 2.25
Interquartile Range and Quartile Deviation using NumPy and SciPy
In statistical analysis, understanding the spread or variability of a dataset is crucial for gaining insights into its distribution and characteristics. Two common measures used for quantifying this variability are the interquartile range (IQR) and quartile deviation.
Quartiles
Quartiles are a kind of quantile that divides the number of data points into four parts, or quarters.
- The first quartile (Q1) , is defined as the middle number between the smallest number and the median of the data set,
- The second quartile (Q2) is the median of the given data set.
- The third quartile (Q3) is the middle number between the median and the largest value of the data set.
Algorithm to find Quartiles
Here’s a step-by-step algorithm to find quartiles:
- Sort the dataset in ascending order.
- Calculate the total number of entries in the dataset.
- If the number of entries is even:
- Calculate the median (Q2) by taking the average of the two middle values.
- Divide the dataset into two halves: the first half containing the smallest n entries and the second half containing the largest n entries, where n = total number of entries / 2.
- Calculate Q1 as the median of the first half.
- Calculate Q3 as the median of the second half.
- If the number of entries is odd:
- Calculate the median (Q2) as the middle value.
- Divide the dataset into two halves: the first half containing the smallest n entries and the second half containing the largest n entries, where n = (total number of entries – 1) / 2.
- Calculate Q1 as the median of the first half.
- Calculate Q3 as the median of the second half.
- The calculated values of Q1, Q2, and Q3 represent the first quartile, median (second quartile), and third quartile respectively.
Range:
It is the difference between the largest value and the smallest value in the given data set.