python pandas matplotlib graph data-analysis

How do we apply the Central Limit Theorem using python?

I've a huge dataset with 271116 rows of data. I normalized the data using the z-score normalization method. I've no idea of knowing if the data actually follows a normal distribution. So I plotted a simple density graph using matplotlib:

hdf = df['Height'].plot(kind = 'kde', stacked = False)
plt.show()

I got this for a result:

Though, the data seems somewhat normal, can I apply the Central Limit Theorem where I take the means of different random samples (say, 10000 times) to get a smooth bell-curve?

Any help in python is appreciated, thanks.

Solution

Something like:

import numpy as np
sampleMeans = []
for _ in range(100000):
    samples = df['Height'].sample(n=100)
    sampleMean = np.mean(samples)
    sampleMeans.append(sampleMean)

#Now you have a list of sample means to plot - should be normally distributed

The mean of the distribution should equal the mean of the original data, and the standard deviation should be a factor of ten less than the original data. If the result isn't smooth enough, then increase .sample(n=100) to a higher figure. This will also decrease the standard deviation of the resulting bell curve. The general rule is that the CLT standard deviation is the data standard deviation divided by sqrt(n).

It's important to note that the resulting distribution is different from the original. It is not merely smoothed out using the CLT.