Search code examples
pythonnumpystatistics-bootstrap

Using bootstrapping random.choice


I am trying to use bootstrapping to make 1000 replications of the sons (np.random.choice) for resampling with replacement, so that i can calculate the mean for each replication. Then I would compare the standard deviation of these mean values ​​with standard.

However I don't get bootstrapping part right, how to fix that?

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from scipy import stats

df = pd.read_csv('http://www.math.uah.edu/stat/data/Pearson.txt',
                 delim_whitespace=True)
df.head()
y = df['Son'].values

Replications = np.random.choice(y, 1000, replace = True)
print("Replications: " , Replications)
print("")
Mean = np.mean(Replications)

print("Mean: " , Mean)

sem = stats.sem(y)
print ("The SEM : ", sem)

Solution

  • You can create 1000 replications of length len(df) each as follows:

    Replications = np.array([np.random.choice(df.Son, len(df), replace = True) for _ in range(1000)])
    Mean = np.mean(Replications, axis=1)
    print("Mean: " , Mean)
    

    Thanks!