I have a dataframe like this:
col1=[i for i in range(10)]
col2=[i**2 for i in range(10)]
df=pd.DataFrame(list(zip(col1,col2)),columns=['col1','col2'])
I want to create a new column using apply that adds the numbers in each row and then it adds then index. Something like
df['col3']=df.apply(lambda x:x['col1']+x['col2']+index(x))
But of course index(x) does not work.
How can I do it in this setting?
Your solution is possible with axis=1
and x.name
, but because loops it is slow:
df['col3'] = df.apply(lambda x: x['col1'] + x['col2'] + x.name, axis=1)
Vectorized solution is add df.index
:
df['col3'] = df['col1'] + df['col2'] + df.index
Performance in 10k sample data:
N = 10000
df=pd.DataFrame({'col1':np.arange(N),
'col2':np.arange(N) ** 2})
In [234]: %timeit df['col3'] = df.apply(lambda x: x['col1'] + x['col2'] + x.name, axis=1)
131 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [235]: %timeit df['col3'] = df['col1'] + df['col2'] + df.index
654 µs ± 90.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)