Search code examples
pythonpandasfunctionlambda

Apply function to all columns of data frame python


I have two dfs

xx
AVERAGE_CALL_DURATION AVERAGE_DURATION CHANGE_OF_DETAILS
267 298 0 0
421 609.33 0.33
330 334 0 0
240.5 666.5 0
628 713 0 0

and

NoC_c
AVERAGE_CALL_DURATION AVERAGE_DURATION CHANGE_OF_DETAILS
-5.93 -4.95 0.90
593.50 595.70 1.00

I want to return 1 if the xx column contains the range within NoC_c (where column names are the same

I can do this for one column

def check_between_ranges(xx, NoC_c):
    ranges = NoC_c['AVERAGE_CALL_DURATION']
    
    if (xx['AVERAGE_CALL_DURATION'] >= ranges.iloc[0]) and (xx['AVERAGE_CALL_DURATION'] <= ranges.iloc[1]):
        return 1
    return xx['AVERAGE_CALL_DURATION']

xx['AVERAGE_CALL_DURATION2'] = xx.apply(lambda x: check_between_ranges(x, NoC_c), axis=1)

However, I need remove the element of manually specifying the column name as the actual dfs contain many more columns.

I have tried

a = NoC_c.columns

def check_between_ranges(xx, NoC_c):
    ranges = NoC_c[a]
    
    if (xx[a] >= ranges.iloc[0]) & (xx[a] <= ranges.iloc[1]):
        return 1

xx.apply(lambda x: check_between_ranges(x, NoC_c[a]), axis=1)

However, I get the error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried the solutions listed here, although, they were unsuccessful

Also read this to address the specific error but didn't aid in my issue

Any help would be appreciated.

Traceback (most recent call last):

  File "<ipython-input-11-2affca771555>", line 10, in <module>
    xx.apply(lambda x: check_between_ranges(x, NoC_c[a]), axis=1)

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\frame.py", line 7552, in apply
    return op.get_result()

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\apply.py", line 185, in get_result
    return self.apply_standard()

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\apply.py", line 276, in apply_standard
    results, res_index = self.apply_series_generator()

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\apply.py", line 305, in apply_series_generator
    results[i] = self.f(v)

  File "<ipython-input-11-2affca771555>", line 10, in <lambda>
    xx.apply(lambda x: check_between_ranges(x, NoC_c[a]), axis=1)

  File "<ipython-input-11-2affca771555>", line 6, in check_between_ranges
    if (xx[a] >= ranges.iloc[0]) & (xx[a] <= ranges.iloc[1]):

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1330, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Edit:: Many thanks to @jch for the solution. I'm re-posting here as I had to modify some of the syntax for it to work with my datasets

def check_between_ranges(x):
    v = []
    
    for c in x.index:
        if (x[c] >= NoC_c.iloc[0][c]) & (x[c] <= NoC_c.iloc[1][c]):
            v += [1]
        else:
            v += [x[c]]
            
    return pd.Series(v, index=x.index)


xx.apply(check_between_ranges, axis=1)

Solution

  • Would this work for you?

    Comparison Function

    def check_between_ranges(x):
        v = []
        
        for c in x.index:
            if (x[c] >= NoC_c.at[0,c]) & (x[c] <= NoC_c.at[1,c]):
                v += [1]
            else:
                v += [x[c]]
                
        return pd.Series(v, index=x.index)
    

    Execution

    xx.apply(check_between_ranges, axis=1)
    

    Result

       AVERAGE_CALL_DURATION  AVERAGE_DURATION  CHANGE_OF_DETAILS
    0                    1.0              1.00               0.00
    1                    1.0            609.33               0.33
    2                    1.0              1.00               0.00
    3                    1.0            666.50               0.00
    4                  628.0            713.00               0.00