Search code examples
pythonpandasdataframelambda

Why does assign (lambda) sometimes reads the entire column rather each individual row when assigning values?


I have a DataFrame with a code in a column. I want to extract the first digit from said code and add to a different column so I can use it to merge it with a different DF.

My code is:

df_a = df_a.assign(index_a = lambda x: int(str(x.code)[0]))

When I use:

df_a = df_a.assign(index_a = lambda x: x.code)

This works and I get a new DF with the extra "Code" column and the entire code. If do any operations here like x.code + 1 or x.code * 5 it works.

Then, when I try to convert each code to a string by doing:

df_a = df_a.assign(index_a = lambda x: str(x.code))

Instead of getting each row with a string code, all rows receive the value of the entire column converted into a massive string.

I had a similar problem in the past trying to navigate lambda functions and learned that as long as I did x + 0 before converting, everything worked okay, but this time it's not working.

I'm obviously doing something wrong, but I can't figure it out.


Solution

  • it's so because str(x.code) convert the whole column each time, try this instead

    df_a['code'] = df_a['code'].apply(str)