Applying the Central Limit Theorem in R
To illustrate the Central Limit Theorem in R, we’ll follow these steps:
1. Generate a Non-Normally Distributed Population
Let’s start by creating a population that is not normally distributed. We’ll use a random sample from a uniform distribution as an example.
R
# Generate a non-normally distributed population set.seed (42) population <- runif (1000, min = 0, max = 1) # Create a histogram of the population hist (population, breaks = 20, probability = TRUE , main = "Histogram with Density Curve" ) |
Output:
2. Draw Random Samples
Next, we’ll draw multiple random samples from this population. The sample size should be large enough for the CLT to hold (typically, a sample size of at least 30 is considered ).
R
# Set the sample size and number of samples sample_size <- 20 num_samples <- 500 # Draw random samples samples <- replicate (num_samples, sample (population, size = sample_size,replace = TRUE )) |
3. Check mean and Variance of Sample Mean and Populations
R
# Calculate sample means sample_means <- colMeans (samples) # FOr sample x_bar <- mean (sample_means) std <- sd (sample_means) print ( 'Sample Mean and Variance' ) print (x_bar) print (std**2) # For Population mu <- mean (population) sigma <- sd (population) print ( 'Population Mean and Variance' ) print (mu) print ((sigma**2)/sample_size) |
Output:
[1] "Sample Mean and Variance"
[1] 0.4887697
[1] 0.003808397
[1] "Population Mean and Variance"
[1] 0.4882555
[1] 0.004246579
4. Plot the Sample distributions
Plot a histogram of the sample means to observe the distribution.
R
# Visualize the sample means hist (sample_means, breaks = 15, prob = TRUE , main = "Distribution of Sample Means" , xlab = "Sample Mean" ) # Distribution Curve curve ( dnorm (x, mean = x_bar, sd = std), col = "Black" , lwd = 2, add = TRUE ) |
Output:
The resulting plot show that the distribution of sample means closely follows a normal distribution, even though the original population was not normally distributed. This is a direct demonstration of the Central Limit Theorem in action.
Example 2: Central limit theorem in R
R
# Set the random seed for reproducibility set.seed (42) # Generate a non-normally distributed population population <- runif (5000, min = 0, max = 1) # Create a histogram of the population par (mfrow = c (1, 2)) # Set up a 1x2 grid for plotting # Plot the histogram of the population hist (population, breaks = 30, prob = TRUE , main = "Population Distribution" , xlab = "Value" , col = "lightblue" ) # Step 2 and 3: Draw random samples and calculate sample means sample_size <- 30 num_samples <- 300 # Empty vector to store sample means sample_means <- c () for (i in 1:num_samples) { # Take a random sample sample <- sample (population, size = sample_size, replace = TRUE ) # Calculate the mean of the sample sample_means[i] <- mean (sample) } # For sample x_bar <- mean (sample_means) std <- sd (sample_means) print ( 'Sample Mean and Variance' ) print (x_bar) print (std**2) # For Population mu <- mean (population) sigma <- sd (population) print ( 'Population Mean and Variance' ) print (mu) print ((sigma**2)/sample_size) # Plot the histogram of sample means hist (sample_means, breaks = 30, prob = TRUE , main = "Distribution of Sample Means" , xlab = "Sample Mean" , col = "lightgreen" ) # Overlay density curves curve ( dnorm (x, mean = x_bar, sd = std), col = "black" , lwd = 2, add = TRUE ) # Add labels and legends legend ( "topright" , legend = c ( "Distribution Curve" ), col = c ( "black" ), lwd = 2) # Reset the plot layout par (mfrow = c (1, 1)) |
Output:
[1] "Sample Mean and Variance"
[1] 0.5010222
[1] 0.002745131
[1] "Population Mean and Variance"
[1] 0.5031668
[1] 0.002823829
Central limit theorem in R
The Central Limit Theorem (CLT) is like a special rule in statistics. It says that if you gather a bunch of data and calculate the average, even if the original data doesn’t look like a neat bell-shaped curve, the averages of those groups will start to look like one if you have enough data.