Central Limit Theorem and its application using Python

Tarique Akhtar
2 min readJan 12, 2023

--

The Central Limit Theorem (CLT) states that, given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population. Furthermore, the distribution of all sample means will be approximately normally distributed, regardless of the distribution of the population from which the samples were taken.

The Central Limit Theorem has many important applications in statistics, such as hypothesis testing and confidence intervals. For example, if we want to estimate the mean weight of a certain type of fruit, we can take a sample of fruits, calculate the mean weight of the sample, and use the CLT to construct a confidence interval for the true mean weight of all fruits of that type.

In Python, we can use the numpy library to generate random samples and calculate their means, as well as the matplotlib library to visualize the distribution of sample means.

First, let’s generate a random sample of 100 fruits from a population with a mean weight of 10 grams and a standard deviation of 2 grams:

import numpy as np

np.random.seed(0)
population_weights = np.random.normal(10, 2, 100)

Let’s see the distribution of this population_weights.

plt.hist(population_weights, bins=20)
plt.xlabel('Population Weight (g)')
plt.ylabel('Frequency')
plt.show()
Population Weight distribution

Next, we can use a for loop to generate 1000 samples of size 100 from the population and calculate the mean weight of each sample:

sample_means = []
for i in range(1000):
sample = np.random.choice(population_weights, size=100, replace=True)
sample_means.append(np.mean(sample))

Finally, we can use matplotlib to visualize the distribution of sample means:

import matplotlib.pyplot as plt

plt.hist(sample_means, bins=20)
plt.xlabel('Sample Mean Weight (g)')
plt.ylabel('Frequency')
plt.show()
Sample Mean distribution which is Normal Distribution

The resulting histogram should be approximately normally distributed, with a mean close to 10 grams (the population mean) and a standard deviation (known as the standard error) that is equal to the standard deviation of the population divided by the square root of the sample size.

The above is a simple application of CLT, it can be applied in many other more complex situations in statistics, finance, and other fields. The theorem is a fundamental concept that is widely used in statistical analysis, and it is important to have a good understanding of it in order to correctly analyze data and draw meaningful conclusions.

Thanks for reading.

--

--

Tarique Akhtar
Tarique Akhtar

Written by Tarique Akhtar

Data Science Professional, Love to learn new things!!!We can get connected through LinkedIn (https://www.linkedin.com/in/tarique-akhtar-6b902651/)

No responses yet