Search code examples
pythonpandasfunctioncalculated-columnsstring-concatenation

How can I calculate a new column depending of the value from my column? I need to create string chains to create my new column in pandas python


I have a column that has a number value (15, ..., 28, etc) called 'big', but depending on this number it should sum the columns with 5 previous numbers columns names, I mean something like...

big c15 c16 c17 ... c27 c28
23 1 0 1 ... 1 0
21 1 1 0 ... 1 1
... 0 0 1 ... 1 0
25 1 0 1 ... 1 1

So, depending on the "big" column, for example, 25, my new column should sum 'c24'+'c23'+'c22'+'c21'+'c20' and the result must be calculated in the new column name.

I have tried several movements but it doesn't works. I show my code below:

def test_fun(df):
    if (df['big'] > 19).all():
        pc = []
        for i in range(1,6):
            x = 'c' + (df['big'] - i).apply(str)
            pc.append(x)
        y = df[pc].sum(axis = 1)
        return y
    elif (df['big'] == 19).all():
        pc = []
        for i in range(1,5):
            x = 'c' + (df['big'] - i).apply(str)
            pc.append(x)
        y = df[pc].sum(axis = 1)
        return y
    elif (df['big'] == 18).all():
        pc = []
        for i in range(1,4):
            x = 'c' + (df['big'] - i).apply(str)
            pc.append(x)
        y = df[pc].sum(axis = 1)
        return y
    else:
        pc = []
        for i in range(1,3):
            x = 'c' + (df['big'] - i).apply(str)
            pc.append(x)
        y = df[pc].sum(axis = 1)
        return y

df['new_column'] = df.apply(lambda row: test_fun(df), axis = 1)

I added several conditions due to actually my table is beginning from c15 to c28 column, but it will be increasing during time.

Finally, when I use the function df.apply() to apply my function by row I had been having several errors during my trials. Some of them like:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

that's what I added .all() in my if, elif, else conditions. Even...

raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index  'c27', 'c27', ...,\n   ('c26', 'c26',...  dtype='object')] are in the [columns]"

Do you know what I should be probably doing wrong?


Solution

  • One way using pandas.DataFrame.apply:

    def get_big(series):
        n = series["big"]
        indices = ["c%s" % i for i in range(n-1, n-6, -1)]
        indices = series.index.intersection(indices)
        return series[indices].sum()
    
    df.apply(get_big, axis=1)
    

    Sample data

       c20  c21  c22  c23  c24  c25  c26  c27  c28  c29  big
    0    0    1    1    0    1    0    1    1    1    0   21
    1    1    0    1    0    0    0    1    0    1    0   28
    2    1    1    0    1    0    1    0    1    0    0   20
    3    0    0    0    0    1    0    0    1    0    1   20
    4    1    1    0    1    0    0    0    0    0    0   23
    5    1    0    0    1    0    0    0    1    0    0   25
    6    0    1    0    0    1    1    1    0    1    0   23
    7    1    0    1    0    0    0    0    1    0    1   20
    8    1    0    1    0    1    1    0    0    0    1   26
    9    0    0    0    1    1    0    1    1    0    1   25
    

    Output:

    0    0
    1    1
    2    0
    3    0
    4    2
    5    2
    6    1
    7    0
    8    3
    9    2
    dtype: int64