I'm running the code below. It creates a couple of dataframes that takes a column in another dataframe that has a list of Conference Names, as its index.
df_conf = pd.read_sql("select distinct Conference from publications where year>=1991 and length(conference)>1 order by conference", db)
for index, row in df_conf.iterrows():
row[0]=row[0].encode("utf-8")
df2= pd.DataFrame(index=df_conf['Conference'], columns=['Citation1991','Citation1992'])
df2 = df2.fillna(0)
df_if= pd.DataFrame(index=df_conf['Conference'], columns=['IF1994','IF1995'])
df_if = df_if.fillna(0)
df_pubs=pd.read_sql("select Conference, Year, count(*) as totalPubs from publications where year>=1991 group by conference, year", db)
for index, row in df_pubs.iterrows():
row[0]=row[0].encode("utf-8")
df_pubs= df_pubs.pivot(index='Conference', columns='Year', values='totalPubs')
df_pubs.fillna(0)
for index, row in df2.iterrows():
df_if.ix[index,'IF1994'] = df2.ix[index,'Citation1992'] / (df_pubs.ix[index,1992]+df_pubs.ix[index,1993])
The last line keeps giving me the following error:
KeyError: 'Analyse dynamischer Systeme in Medizin, Biologie und \xc3\x96kologie'
Not quite sure what I'm doing wrong. I tried encoding the indexes. It won't work. I even tried .at
still wont' work.
I know it has to do with encoding, as it always stops at indexes with non-ascii characters.
I'm using python 2.7
I think the problem with this:
for index, row in df_conf.iterrows():
row[0]=row[0].encode("utf-8")
is that it may or may not work, I'm surprised it didn't raise a warning.
Besides that it's much quicker to use the vectorised str
method to encode
the series:
df_conf['col_name'] = df_conf['col_name'].str.encode('utf-8')
If needed you can also encode the index in a similar fashion:
df.index = df.index.str.encode('utf-8')