Search code examples
pandasdataframemultiple-columnscalculated-columns

How to calculate multiple columns from multiple columns in pandas


I am trying to calculate multiple colums from multiple columns in a pandas dataframe using a function. The function takes three arguments -a-, -b-, and -c- and and returns three calculated values -sum-, -prod- and -quot-. In my pandas data frame I have three coumns -a-, -b- and and -c- from which I want to calculate the columns -sum-, -prod- and -quot-.

The mapping that I do works only when I have exactly three rows. I do not know what is going wrong, although I expect that it has to do something with selecting the correct axis. Could someone explain what is happening and how I can calculate the values that I would like to have. Below are the situations that I have tested.

INITIAL VALUES

def sum_prod_quot(a,b,c):
    sum  = a + b + c
    prod = a * b * c
    quot = a / b / c
    return (sum, prod, quot)

df = pd.DataFrame({ 'a': [20, 100, 18],
                    'b': [ 5,  10,  3],
                    'c': [ 2,  10,  6],
                    'd': [ 1,   2,  3]
                 })

df
    a   b   c  d
0   20   5   2  1
1  100  10  10  2
2   18   3   6  3

CALCULATION STEPS

Using exactly three rows

When I calculate three columns from this dataframe and using the function function I get:

df['sum'], df['prod'], df['quot'] = \
        list( map(sum_prod_quot, df['a'], df['b'], df['c']))

df
     a   b   c  d    sum     prod   quot
0   20   5   2  1   27.0    120.0   27.0
1  100  10  10  2  200.0  10000.0  324.0
2   18   3   6  3    2.0      1.0    1.0

This is exactly the result that I want to have: The sum-column has the sum of the elements in the columns a,b,c; the prod-column has the product of the elements in the columns a,b,c and the quot-column has the quotients of the elements in the columns a,b,c.

Using more than three rows

When I expand the dataframe with one row, I get an error!

The data frame is defined as:

df = pd.DataFrame({ 'a': [20, 100, 18, 40],
                    'b': [ 5,  10,  3, 10],
                    'c': [ 2,  10,  6,  4],
                    'd': [ 1,   2,  3,  4]
                 })
df
     a   b   c  d
0   20   5   2  1
1  100  10  10  2
2   18   3   6  3
3   40  10   4  4

The call is

df['sum'], df['prod'], df['quot'] = \
        list( map(sum_prod_quot, df['a'], df['b'], df['c']))

The result is

...
    list( map(sum_prod_quot, df['a'], df['b'], df['c']))
ValueError: too many values to unpack (expected 3) 

while I would expect an extra row:

df
     a   b   c  d    sum     prod   quot
0   20   5   2  1   27.0    120.0   27.0
1  100  10  10  2  200.0  10000.0  324.0
2   18   3   6  3    2.0      1.0    1.0
3   40  10   4  4   54.0   1600.0    1.0

Using less than three rows

When I reduce tthe dataframe with one row I get also an error. The dataframe is defined as:

df = pd.DataFrame({ 'a': [20, 100],
                    'b': [ 5,  10],
                    'c': [ 2,  10],
                    'd': [ 1,   2]
                 })
df
     a   b   c  d
0   20   5   2  1
1  100  10  10  2

The call is

df['sum'], df['prod'], df['quot'] = \
        list( map(sum_prod_quot, df['a'], df['b'], df['c']))

The result is

...
    list( map(sum_prod_quot, df['a'], df['b'], df['c']))
ValueError: need more than 2 values to unpack

while I would expect a row less:

df
     a   b   c  d    sum     prod   quot
0   20   5   2  1   27.0    120.0   27.0
1  100  10  10  2  200.0  10000.0  324.0

QUESTIONS

The questions I have:

1) Why do I get these errors?

2) How do I have to modify the call such that I get the desired data frame?

NOTE

In this link a similar question is asked, but the given answer did not work for me.


Solution

  • The answer doesn't seem correct for 3 rows as well. Can you check other values except first row and first column. Looking at the results, product of 20*5*2 is NOT 120, it's 200 and is placed below in sum column. You need to form list in correct way before assigning to new columns. You can try use following to set the new columns:

    df['sum'], df['prod'], df['quot'] = zip(*map(sum_prod_quot, df['a'], df['b'], df['c']))
    

    For details follow the link