Search code examples
python-3.xpandasindex-error

PYTHON Pandas Dataframe SENSEI DataFrame Strange IndexError in the iloc[] code for copying rows from one dataframe to another


Using Python 3.5 on Spyder based on Anaconda3 on Windows 10 with 2 GPUs machine:

I am using a Sensei Japanese Karate Masters Dataset to check which are the masters who were conferred Master Degree in the last 5 years and copy all their rows to another Pandas Dataframe sensei5yrs for further processing:

I need to copy row by row from one Pandas dataframe "sensei" to another Pandas dataframe sensei5yrs I am using the below code but it fails:

for i in range(0, len(sensei)-1):
    #print(sensei.iloc[i]['Year'], sensei.iloc[i]['Date']
    chk=sensei.iloc[i]['Year']
    print("Checker: ",chk, sensei)
    if str(chk) in ['2017','2016','2015','2014','2013']:
        print("Found one year")
        sensei5yrs.iloc[i]=sensei.iloc[i]

The error comes in the last line of the above code:

IndexError: single positional indexer is out-of-bounds

print(sensei5yrs)
Empty DataFrame
Columns: [Date, OpenScore, HighScore, LowScore, CloseScore, Adj CloseScore, 
VolumeFights, Year]

The sensei pandas dataframe has the following structure and data:

print(sensei)
Columns: [Date, OpenScore, HighScore, LowScore, CloseScore, Adj CloseScore,VolumeFights, Year]
0 2000-11-23  3837.110107  3871.340088  3826.419922  3852.399902  3852.399902   12800  2000
1 2017-11-24  3860.520020  3889.560059  3856.580078  3868.340088  3868.340088    12800  2017   

the Error when the For loop encounters 2017 for example in the 'Year' column is below:

IndexError: single positional indexer is out-of-bounds

PS: The above code runs fine till it encounters a row which has Year==['2017','2016','2015','2014','2013'] immediately on encountering any of the above years in the 'Year' column it throws the above IndexError

A Hefty ThanksOff to all of you who attempt to solve this puzzle.


Solution

  • There is no real need for you to do this by iterating over each line. You can just create a new sensei5 object by using boolean indexing on the original dataframe as follows:

    import pandas as pd
    year = ['2017','2010','2015','2014','2013', '2013','2011','2012','2014','2010']
    master = ['foo', 'bar', 'foo1', 'bar1', 'foo2', 'bar2', 'foo3', 'bar3', 'foo4', 'bar4']
    
    sensei = pd.DataFrame({'year' : year, 'master' : master})
    sensei5 = sensei[2017 - sensei['year'].astype(int) <=5]
    

    which gives:

      master  year
    0    foo  2017
    2   foo1  2015
    3   bar1  2014
    4   foo2  2013
    5   bar2  2013
    7   bar3  2012
    8   foo4  2014
    

    However, if you must do it by iteration using loc then the following seems to work fine. You might be getting problems because you are subtracting 1 from the length of the dataframe. I suspect you do not intend this, as python Range stops one before the stop condition:

    newSensei5 = pd.DataFrame(columns = sensei.columns)
    for row in range(len(sensei)):
        if int(sensei.loc[row, 'year']) >=2012:
            newSensei5.loc[row, :] = sensei.loc[row, :]
    

    which gives:

    In [23]: newSensei5
    Out[23]:
      master  year
    0    foo  2017
    2   foo1  2015
    3   bar1  2014
    4   foo2  2013
    5   bar2  2013
    7   bar3  2012
    8   foo4  2014
    

    EDIT I had not noticed that you also passed data. I recreated this as follows:

    Columns = ["Date", "OpenScore", "HighScore", "LowScore", "CloseScore", "Adj CloseScore", "VolumeFights", "Year"]
    a = [2000-11-23, 3837.110107,  3871.340088, 3826.419922,  3852.399902,  3852.399902 ,  12800 , "2000"]
    b = [2017-11-24, 3860.520020,  3889.560059, 3856.580078,  3868.340088,  3868.340088 ,   12800, "2017"]
    
    df = pd.DataFrame([a, b], columns = Columns)
    df5 = df[2017 - df['Year'].astype(int) <=5]
    

    which worked perfectly fine,