Using Python 3.5 on Spyder based on Anaconda3 on Windows 10 with 2 GPUs machine:
I am using a Sensei Japanese Karate Masters Dataset to check which are the masters who were conferred Master Degree in the last 5 years and copy all their rows to another Pandas Dataframe sensei5yrs for further processing:
I need to copy row by row from one Pandas dataframe "sensei" to another Pandas dataframe sensei5yrs I am using the below code but it fails:
for i in range(0, len(sensei)-1):
#print(sensei.iloc[i]['Year'], sensei.iloc[i]['Date']
chk=sensei.iloc[i]['Year']
print("Checker: ",chk, sensei)
if str(chk) in ['2017','2016','2015','2014','2013']:
print("Found one year")
sensei5yrs.iloc[i]=sensei.iloc[i]
The error comes in the last line of the above code:
IndexError: single positional indexer is out-of-bounds
print(sensei5yrs)
Empty DataFrame
Columns: [Date, OpenScore, HighScore, LowScore, CloseScore, Adj CloseScore,
VolumeFights, Year]
The sensei pandas dataframe has the following structure and data:
print(sensei)
Columns: [Date, OpenScore, HighScore, LowScore, CloseScore, Adj CloseScore,VolumeFights, Year]
0 2000-11-23 3837.110107 3871.340088 3826.419922 3852.399902 3852.399902 12800 2000
1 2017-11-24 3860.520020 3889.560059 3856.580078 3868.340088 3868.340088 12800 2017
the Error when the For loop encounters 2017 for example in the 'Year' column is below:
IndexError: single positional indexer is out-of-bounds
PS: The above code runs fine till it encounters a row which has Year==['2017','2016','2015','2014','2013'] immediately on encountering any of the above years in the 'Year' column it throws the above IndexError
A Hefty ThanksOff to all of you who attempt to solve this puzzle.
There is no real need for you to do this by iterating over each line. You can just create a new sensei5 object by using boolean indexing on the original dataframe as follows:
import pandas as pd
year = ['2017','2010','2015','2014','2013', '2013','2011','2012','2014','2010']
master = ['foo', 'bar', 'foo1', 'bar1', 'foo2', 'bar2', 'foo3', 'bar3', 'foo4', 'bar4']
sensei = pd.DataFrame({'year' : year, 'master' : master})
sensei5 = sensei[2017 - sensei['year'].astype(int) <=5]
which gives:
master year
0 foo 2017
2 foo1 2015
3 bar1 2014
4 foo2 2013
5 bar2 2013
7 bar3 2012
8 foo4 2014
However, if you must do it by iteration using loc then the following seems to work fine. You might be getting problems because you are subtracting 1 from the length of the dataframe. I suspect you do not intend this, as python Range stops one before the stop condition:
newSensei5 = pd.DataFrame(columns = sensei.columns)
for row in range(len(sensei)):
if int(sensei.loc[row, 'year']) >=2012:
newSensei5.loc[row, :] = sensei.loc[row, :]
which gives:
In [23]: newSensei5
Out[23]:
master year
0 foo 2017
2 foo1 2015
3 bar1 2014
4 foo2 2013
5 bar2 2013
7 bar3 2012
8 foo4 2014
EDIT I had not noticed that you also passed data. I recreated this as follows:
Columns = ["Date", "OpenScore", "HighScore", "LowScore", "CloseScore", "Adj CloseScore", "VolumeFights", "Year"]
a = [2000-11-23, 3837.110107, 3871.340088, 3826.419922, 3852.399902, 3852.399902 , 12800 , "2000"]
b = [2017-11-24, 3860.520020, 3889.560059, 3856.580078, 3868.340088, 3868.340088 , 12800, "2017"]
df = pd.DataFrame([a, b], columns = Columns)
df5 = df[2017 - df['Year'].astype(int) <=5]
which worked perfectly fine,