Search code examples
pythonpandasnumpystatisticssimulation

KeyError: [....] not found in index


For a project that I am working on, I created a linear regression model. After fitting that line, I wanted to simulate the data over and over again using np.random.choice on my data to see the variability in the regression line say the data be recollected. However I keep getting a KeyError: in my function and I am not sure how to fix it.

Here is a head of what the data looks like:

enter image description here

I ran a linear regression model on the columns 'nsb' and 'r'. Here are my functions that repeatedly creates linear regression models for 'bootstrapped' data:

enter image description here

When I call this:

slope, int = draw_bs_pairs_linreg(big_df['nsb'], big_df['r'], size = 1000)

I get this error, which each time I run it the length and values in the list of numbers changes each time I run it.

KeyError: '[2, 567, 459, 458, 355, 230, 353, 565, 231, 566, 117] not in index'

Any help would be appriciated.


Solution

  • You need DataFrame.reset_index before call your function

    big_df = big_df.reset_index(drop=True) 
    

    Or indexing with .iloc

    bs_x, bs_y = x.iloc[bs_inds], y.iloc[bs_inds]