Search code examples
pythonpandasquandl

question how to deal with KeyError: 0 or KeyError: 1 etc


I am new in python and this data science world and I am trying to play with different datasets.

In this case I am using the housing price index from quandl but unfortunately I get stuck when when I need to take the abbreviations names from the wiki page always getting the same Error KeyError.

import quandl
import pandas as pd
#pull every single housing price index from quandl

#quandl api key
api_key = 'xxxxxxxxxxxx'

#get stuff from quandl
df = quandl.get('FMAC/HPI_AK',authtoken = api_key) #alaska \
##print(df.head())

#get 50 states using pandas read html from wikipedia
fifty_states = pd.read_html('https://en.wikipedia.org       /wiki/List_of_states_and_territories_of_the_United_States')
##print(fifty_states[0][1]) #first data frame is index 0, #looking for     column 1,#from element 1 on

#get quandl frannymac query names for each 50 state
for abbv in fifty_states[0][1][2:]:
#print('FMAC/HPI_'+str(abbv))

So the problem I got in the following step:

#get 50 states using pandas read html from wikipedia
fifty_states = pd.read_html('https://en.wikipedia.org       /wiki/List_of_states_and_territories_of_the_United_States')
##print(fifty_states[0][1]) #first data frame is index 0, #looking for     column 1,#from element 1 on

I have tried different ways to get just the abbreviation but does not work

for abbv in fifty_states[0][1][2:]:
#print('FMAC/HPI_'+str(abbv)) 

for abbv in fifty_states[0][1][1:]:
#print('FMAC/HPI_'+str(abbv))

always Keyerror: 0

I just need this step to work, and to have the following output:

FMAC/HPI_AL,
FMAC/HPI_AK,
FMAC/HPI_AZ,
FMAC/HPI_AR,
FMAC/HPI_CA,
FMAC/HPI_CO,
FMAC/HPI_CT,
FMAC/HPI_DE,
FMAC/HPI_FL,
FMAC/HPI_GA,
FMAC/HPI_HI,
FMAC/HPI_ID,
FMAC/HPI_IL,
FMAC/HPI_IN,
FMAC/HPI_IA,
FMAC/HPI_KS,
FMAC/HPI_KY,
FMAC/HPI_LA,
FMAC/HPI_ME

for the 50 states from US and then proceed to make a data analysis from this data.

Can anybody tell me what am I doing wrong ? cheers


Solution

  • Note that fifty_states is a list of DataFrames, filled with content of tables from the source page.

    The first of them (at index 0 in fifty_states) is the table of US states.

    If you don't know column names in a DataFrame (e.g. df), to get column 1 from it (numeration form 0), run:

    df.iloc[:, 1]
    

    So, since we want this column from fifty_states[0], run:

    fifty_states[0].iloc[:, 1]
    

    Your code failed because you attempted to apply [1] to this DataFrame, but this DataFrame has no column named 1.

    Note that e.g. fifty_states[0][('Cities', 'Capital')] gives proper result, because:

    • this DataFrame has a MultiIndex on columns,
    • one of columns has Cities at the first MultiIndex level and Capital at the second level.

    And getting back to your code, run:

    for abbv in fifty_states[0].iloc[:, 1]:
        print('FMAC/HPI_' + str(abbv))
    

    Note that [2:] is not needed. You probably wanted to skip 2 initial rows of the <table> HTML tag, containing column names, but in Pandas they are actually kept in the MultiIndex on columns, so to get all values, you don't need to skip anything.

    If you want these strings as a list, for future use, the code can be:

    your_list = ('FMAC/HPI_' + fifty_states[0].iloc[:, 1]).tolist()