Search code examples
pythonpandasdataframemathgroup-by

From a pandas dataframe containing net exports between any two countries, how can I get second dataframe containing net exports for each country?


I have a dataframe df containing net exports between any two countries in column From and To respectively.

df.to_dict() returns

{'From': {0: 'A', 1: 'A', 2: 'B', 3: 'C', 4: 'D'},
 'To': {0: 'B', 1: 'C', 2: 'C', 3: 'D', 4: 'A'},
 'Net exports': {0: 3, 1: 6, 2: 2, 3: 2, 4: 5}}

It looks as follows: [1]: https://i.sstatic.net/y6BRe.png

I want to get a second dataframe df2 which shows net trade per country. It means, if a country is in From column in df, its value needs to be added. If the country is in To column in df, its value needs to be subtracted.

It should look something as shown: [2]: https://i.sstatic.net/I1xrE.png

For example, in df, A to B is 3, A to C is 6 and D to A is 5. Hence, the value of A in df2 is 3+6-5 = 4.

Note, sometimes some countries may not appear in both From and To columns in df. To get list of all countries in both columns, I could use

all_countries = list(set(df["From"]).union(set(df["To"])))
all_countries

However, what would be the process to proceed to the next step to get df2 from df?


Solution

  • Compute a groupby.sum on both From and To, subtract and reset_index:

    out = (df
       .groupby('From')['Net exports'].sum()
       .sub(df.groupby('To')['Net exports'].sum())
       .rename_axis('Countries').reset_index()
    )
    

    Output:

      Countries  Net exports
    0         A            4
    1         B           -1
    2         C           -6
    3         D            3