python numpy machine-learning data-science sampling

A mathematical explanation for why variance of bootstrap estimates decreases

I am trying to grok bootstrapping and bagging (bootstrap aggregation), so I've been attempting to perform some experiments. I loaded in a sample dataset from Kaggle and attempted to use the bootstrapping method:

X = pd.read_csv("dataset.csv")
true_median = np.median(X["Impressions"])
B = 500
    errors = []
    variances = []
    for b in range(1, B):
        sample_medians = [np.median(X.sample(len(X), replace=True)["Impressions"]) for i in range(b)]
        error = np.mean(sample_medians) - true_median
        variances.append(np.std(sample_medians) ** 2)
        errors.append(error)

Then I visualized errors and variances:

fig, ax1 = plt.subplots()

color = 'tab:red'
ax1.set_xlabel('Number of Bootstrap Samples (B)')
ax1.set_ylabel('Bootstrap Estimate Error', color=color)
ax1.plot(errors, color=color, alpha=0.7)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()

color = 'tab:blue'
ax2.set_ylabel('Bootstrap Estimate Variance', color=color)
ax2.plot(variances, color=color, alpha=0.7)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()
plt.title("Relationship Between Bootstrap Error, Variance \nand Number of Bootstrap Iterations")
plt.show()

This is the output of the plot:

You can see that both the error and the variance decrease as B increases. I'm trying to find some sort of mathematical justification - is there a way to derive or prove why the variance of bootstrap estimates decreases when B increases?

Solution

I think what you are seeing is Central-Limit Theorem in play. When the loop starts, the number of samples from the population with replacement is small and the mean of medians (you call this error) is not representative of reaching the true population median. As you generate more samples, the mean of medians converge to true median asymptotically. As the convergence happens towards true mean, the samples from this distribution are not far enough to generate a large variance and it also reaches convergence.

Did that clarify? If not please elaborate what you are expecting to see while plotting them and we can discuss about how to get there.