Search code examples
pythonpandasapply

Apply function on pandas using the index


I have a dataframe like this:

col1=[i for i in range(10)]
col2=[i**2 for i in range(10)]
df=pd.DataFrame(list(zip(col1,col2)),columns=['col1','col2'])

I want to create a new column using apply that adds the numbers in each row and then it adds then index. Something like

df['col3']=df.apply(lambda x:x['col1']+x['col2']+index(x))

But of course index(x) does not work.

How can I do it in this setting?


Solution

  • Your solution is possible with axis=1 and x.name, but because loops it is slow:

    df['col3'] = df.apply(lambda x: x['col1'] + x['col2'] + x.name, axis=1)
    

    Vectorized solution is add df.index:

    df['col3'] = df['col1'] + df['col2'] + df.index
    

    Performance in 10k sample data:

    N = 10000
    df=pd.DataFrame({'col1':np.arange(N),
                     'col2':np.arange(N) ** 2})
    
    
    
    In [234]: %timeit df['col3'] = df.apply(lambda x: x['col1'] + x['col2'] + x.name, axis=1)
    131 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    In [235]: %timeit df['col3'] = df['col1'] + df['col2'] + df.index
    654 µs ± 90.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)