Computing C.I. using Bootstrapping

Bootstrapping is a test/metric that uses random sampling with replacement. It gives the measure of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. It allows the estimation of the sampling distribution for most of the statistics using random sampling methods. It may also be used for constructing hypothesis tests.

Example:

Python3

# import libraries 
import pandas 
import numpy 
from sklearn.utils import resample 
from sklearn.metrics import accuracy_score 
from matplotlib import pyplot as plt 
  
# load dataset 
x = numpy.array([180,162,158,172,168,150,171,183,165,176]) 
  
# configure bootstrap 
n_iterations = 1000 # here k=no. of bootstrapped samples 
n_size = int(len(x)) 
  
# run bootstrap 
medians = list() 
for i in range(n_iterations): 
   s = resample(x, n_samples=n_size); 
   m = numpy.median(s); 
   medians.append(m) 
  
# plot scores 
plt.hist(medians) 
plt.show() 
  
# confidence intervals 
alpha = 0.95
p = ((1.0-alpha)/2.0) * 100
lower =  numpy.percentile(medians, p) 
p = (alpha+((1.0-alpha)/2.0)) * 100
upper =  numpy.percentile(medians, p) 
  
print(f"\n{alpha*100} confidence interval {lower} and {upper}")

After importing all the necessary libraries create a sample S with size n=10 and store it in a variable x. Using a simple loop generate 1000 artificial samples (=k) with each sample size m=10 (since m<=n). These samples are called the bootstrapped sample. Their medians are computed and stored in a list ‘medians’. Histogram of Medians from 1000 bootstrapped samples is plotted with the help of matplotlib library and using the formula confidence interval of a sample statistic calculates an upper and lower bound for the population value of the statistic at a specified level of confidence based on sample data is calculated.

95.0 confidence interval lies between 161.5 and 176.0

How to Plot a Confidence Interval in Python?

Confidence Interval is a type of estimate computed from the statistics of the observed data which gives a range of values that’s likely to contain a population parameter with a particular level of confidence.

A confidence interval for the mean is a range of values between which the population mean possibly lies. If I’d make a weather prediction for tomorrow of somewhere between -100 degrees and +100 degrees, I can be 100% sure that this will be correct. However, if I make the prediction to be between 20.4 and 20.5 degrees Celsius, I’m less confident. Note how the confidence decreases, as the interval decreases. The same applies to statistical confidence intervals, but they also rely on other factors.

A 95% confidence interval, will tell me that if we take an infinite number of samples from my population, calculate the interval each time, then in 95% of those intervals, the interval will contain the true population mean. So, with one sample we can calculate the sample mean, and from there get an interval around it, that most likely will contain the true population mean.

Area under the two black lines shows the 95% confidence interval

Confidence Interval as a concept was put forth by Jerzy Neyman in a paper published in 1937. There are various types of the confidence interval, some of the most commonly used ones are: CI for mean, CI for the median, CI for the difference between means, CI for a proportion and CI for the difference in proportions.

Let’s have a look at how this goes with Python.

Computing C.I. using Bootstrapping

Python3

How to Plot a Confidence Interval in Python?

Categories

Contact US

Computing C.I. using Bootstrapping

Python3

How to Plot a Confidence Interval in Python?

Similar Reads

Categories

Contact US