I am new in python and this data science world and I am trying to play with different datasets.
In this case I am using the housing price index from quandl but unfortunately I get stuck when when I need to take the abbreviations names from the wiki page always getting the same Error KeyError.
import quandl
import pandas as pd
#pull every single housing price index from quandl
#quandl api key
api_key = 'xxxxxxxxxxxx'
#get stuff from quandl
df = quandl.get('FMAC/HPI_AK',authtoken = api_key) #alaska \
##print(df.head())
#get 50 states using pandas read html from wikipedia
fifty_states = pd.read_html('https://en.wikipedia.org /wiki/List_of_states_and_territories_of_the_United_States')
##print(fifty_states[0][1]) #first data frame is index 0, #looking for column 1,#from element 1 on
#get quandl frannymac query names for each 50 state
for abbv in fifty_states[0][1][2:]:
#print('FMAC/HPI_'+str(abbv))
So the problem I got in the following step:
#get 50 states using pandas read html from wikipedia
fifty_states = pd.read_html('https://en.wikipedia.org /wiki/List_of_states_and_territories_of_the_United_States')
##print(fifty_states[0][1]) #first data frame is index 0, #looking for column 1,#from element 1 on
I have tried different ways to get just the abbreviation but does not work
for abbv in fifty_states[0][1][2:]:
#print('FMAC/HPI_'+str(abbv))
for abbv in fifty_states[0][1][1:]:
#print('FMAC/HPI_'+str(abbv))
always Keyerror: 0
I just need this step to work, and to have the following output:
FMAC/HPI_AL,
FMAC/HPI_AK,
FMAC/HPI_AZ,
FMAC/HPI_AR,
FMAC/HPI_CA,
FMAC/HPI_CO,
FMAC/HPI_CT,
FMAC/HPI_DE,
FMAC/HPI_FL,
FMAC/HPI_GA,
FMAC/HPI_HI,
FMAC/HPI_ID,
FMAC/HPI_IL,
FMAC/HPI_IN,
FMAC/HPI_IA,
FMAC/HPI_KS,
FMAC/HPI_KY,
FMAC/HPI_LA,
FMAC/HPI_ME
for the 50 states from US and then proceed to make a data analysis from this data.
Can anybody tell me what am I doing wrong ? cheers
Note that fifty_states
is a list of DataFrames, filled with
content of tables from the source page.
The first of them (at index 0 in fifty_states) is the table of US states.
If you don't know column names in a DataFrame (e.g. df), to get column 1 from it (numeration form 0), run:
df.iloc[:, 1]
So, since we want this column from fifty_states[0], run:
fifty_states[0].iloc[:, 1]
Your code failed because you attempted to apply [1] to this DataFrame, but this DataFrame has no column named 1.
Note that e.g. fifty_states[0][('Cities', 'Capital')]
gives proper result,
because:
And getting back to your code, run:
for abbv in fifty_states[0].iloc[:, 1]:
print('FMAC/HPI_' + str(abbv))
Note that [2:] is not needed. You probably wanted to skip 2 initial rows of the <table> HTML tag, containing column names, but in Pandas they are actually kept in the MultiIndex on columns, so to get all values, you don't need to skip anything.
If you want these strings as a list, for future use, the code can be:
your_list = ('FMAC/HPI_' + fifty_states[0].iloc[:, 1]).tolist()