Search code examples
pythonpandassum

python sum values in columns taken from another dataframe


I have a dataframe ("MUNg") like this:

MUN_id    Col1
1-2       a
3         b
4-5-6     c
...

And another dataframe ("ppc") like this:

     id     population
0     1     20
1     2     25
2     3     4
3     4     45
4     5     100
5     6     50
...

I need to create a column in "MUNg" that contains the total population obtained by summing the population corresponding to the ids from "pcc", that are present in MUN_id

Expected result:

MUN_id    Col1    total_population
1-2       a       45
3         b       4
4-5-6     c       195
...

I don't write how I tried to achieve this, because I am new to python and I don't know how to do it.

MUNg['total_population']=?

Many thanks!


Solution

  • You can split and explode your string into new rows, map the population data and GroupBy.agg to get the sum:

    MUNg['total_population'] = (MUNg['MUN_id']
     .str.split('-')
     .explode()
     .astype(int) # required if "id" in "ppc" is an integer, comment if string
     .map(ppc.set_index('id')['population'])
     .groupby(level=0).sum()
    )
    

    output:

      MUN_id Col1  total_population
    0    1-2    a                45
    1      3    b                 4
    2  4-5-6    c               195