How to combine and do group computations on Pandas datasets?

I'm working on an economics paper and need some help with combining and transforming two datasets.

I have two pandas dataframes, one with a list of countries and their neighbors (borderdf) such as

borderdf
country    neighbor
sweden     norway
sweden     denmark
denmark    germany
denmark    sweden

and one with data (datadf) for each country and year such as

datadf
country    gdp    year
sweden     5454   2004
sweden     5676   2005
norway     3433   2004
norway     3433   2005
denmark    2132   2004
denmark    2342   2005

I need to create a column in the datadf for neighbormeangdp that would contain the mean of the gdp of all the neighbors, as given by neighbordf. I would like my result to look like this:

datadf
country    year    gdp    neighborsmeangdp
sweden     2004    5454   5565
sweden     2005    5676   5775

How should I go about doing this?

Solution

You can directly merge the two using pandas merge function. The trick here is that you actually want to merge the country column in your datadf with the neighbor column in your borderdf. Then use groupby and mean to get the average neighbor gdp. Finally, merge back with the data to get the country's own GDP. For example:

import pandas as pd
from StringIO import StringIO

border_csv = '''
country, neighbor
sweden, norway
sweden, denmark
denmark, germany
denmark, sweden
'''

data_csv = '''
country, gdp, year
sweden, 5454, 2004
sweden, 5676, 2005
norway, 3433, 2004
norway, 3433, 2005
denmark, 2132, 2004
denmark, 2342, 2005
'''

borders = pd.read_csv(StringIO(border_csv), sep=',\s*', header=1)
data = pd.read_csv(StringIO(data_csv), sep=',\s*', header=1)

merged = pd.merge(borders,data,left_on='neighbor',right_on='country')
merged = merged.drop('country_y', axis=1)
merged.columns = ['country','neighbor','gdp','year']


grouped = merged.groupby(['country','year'])
neighbor_means = grouped.mean()
neighbor_means.columns = ['neighbor_gdp']
neighbor_means.reset_index(inplace=True)

results_df = pd.merge(neighbor_means,data, on=['country','year'])