Search code examples
python-2.7joinmergequandl

How to join/merge two data frames from Quandl?


Specificly, I use Python2.7. I read and print the two data frames from Quandl: 'FMAC/HPI_AK' and 'FMAC/HPI_CA' individually with no problem. I used merged = pd.merge(df1, df2, on = 'Date', how = 'outer') to merge the two data frames. But when I tried to merge the two data frames, I get a traceback saying keyerror: 'Date' where 'Date' is the attribute in the first/index column in both data frames.

import quandl
import pandas as pd

api_key = open('quandlapikey.txt', 'r').read()
df1 = quandl.get('FMAC/HPI_ak', authtoken=api_key)
df2 = quandl.get('FMAC/HPI_ca', authtoken=api_key)
print(df1.head())
print(df2.head())
merged = pd.merge(df1, df2, on = 'Date', how = 'outer')
merged.set_index('Date', inplace = True)
print(merged)

Date Value
1975-01-31 15.671711

1975-02-28 15.726897

1975-03-31 15.919058

1975-04-30 16.233030

1975-05-31 16.494823

Date Value
1975-01-31 34.447924

1975-02-28 34.958144

1975-03-31 35.480144

1975-04-30 36.024334

1975-05-31 36.617578

Traceback (most recent call last):

File "", line 1, in runfile('/Users/hans/Desktop/sentdex/buildingdataset.py', wdir='/Users/hans/Desktop/sentdex')

File "/Users/hans/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile execfile(filename, namespace)

File "/Users/hans/anaconda2/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile builtins.execfile(filename, *where)

File "/Users/hans/Desktop/sentdex/buildingdataset.py", line 22, in merged = pd.merge(df1, df2, on = 'Date', how = 'outer')

File "/Users/hans/anaconda2/lib/python2.7/site-packages/pandas/tools/merge.py", line 61, in merge copy=copy, indicator=indicator)

File "/Users/hans/anaconda2/lib/python2.7/site-packages/pandas/tools/merge.py", line 543, in init self.join_names) = self._get_merge_keys()

File "/Users/hans/anaconda2/lib/python2.7/site-packages/pandas/tools/merge.py", line 810, in _get_merge_keys right_keys.append(right[rk]._values)

File "/Users/hans/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 2059, in getitem return self._getitem_column(key)

File "/Users/hans/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 2066, in _getitem_column return self._get_item_cache(key)

File "/Users/hans/anaconda2/lib/python2.7/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache values = self._data.get(item)

File "/Users/hans/anaconda2/lib/python2.7/site-packages/pandas/core/internals.py", line 3543, in get loc = self.items.get_loc(item)

File "/Users/hans/anaconda2/lib/python2.7/site-packages/pandas/indexes/base.py", line 2136, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key))

File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)

File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)

File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)

File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)

KeyError: 'Date'


Solution

  • You're getting that error because Date is an index in those DataFrames not a column.

    You can instead do (tested):

    merged = pd.merge(df1, df2, how='outer', left_index=True, right_index=True)