I would like to assign a column a slice of variable lentgh of another column, but somehow it does not work as I expect, and I do not understand why:
import numpy as np
import pandas as pd
m = np.array([[1, 'AAAAA'],
[2, 'BBBB'],
[3, 'CCC']])
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str.slice(start=0, stop=x['s1'].str.len()-1))
)
print(df)
which leads to
id s1 s2
0 1 AAAAA NaN
1 2 BBBB NaN
2 3 CCC NaN
However, I would expect the following:
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
Any idea what happens here?
You need str[:-1]
for indexing all values of column without last:
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x['s1'].str[:-1])
)
print(df)
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC
Your solution working only is use apply
for check each row separately, like:
df = (pd.DataFrame(m, columns = ['id', 's1'])
.assign(
s2 = lambda x: x.apply(lambda y: y['s1'][0:len(y['s1'])-1], axis=1))
)
print(df)
id s1 s2
0 1 AAAAA AAAA
1 2 BBBB BBB
2 3 CCC CC