Search code examples
pythonpandasdataframe

Modifying a new column in dataframe


I'm having trouble modifying a new column.

import pandas as pd
import numpy as np

da=np.array([[[1,2], 1,2,100],[[1,2], 1,3,100],[[1,2], 4,1,100], [[1,2], 5,6,101], [[1,2], 7,9,102], [[1,2], 8,7,102]])

col = ['N', 'NRW', 'NRW_1', 'NRW_2']

ramka = pd.DataFrame(columns = col, data = da)
print (ramka)

ramka['TEST'] = ramka['NRW'].apply(lambda x: x+3)
print (ramka)

It's ok here. I get a new column with NRW+3. NRW unchanged.

ramka['TEST2'] = ramka['N'].apply(lambda x: x.append(555))
print (ramka)

Here I get new column with None. Column N has been changed. Why?

If I make a copy of column N

ramka['TEST4'] = ramka['N']

and perform the same operation on a new column, the changes will be applied to both columns N and TEST4

ramka['TEST4'].apply(lambda x: x.append(666))
print (ramka)

I don't understand. Please help.


Solution

  • list.append is in place (and thus modifies the lists in N and returns None), you should use:

    ramka['TEST2'] = ramka['N'].apply(lambda x: x+[555])
    ramka['TEST4'] = ramka['N'].apply(lambda x: x+[666])
    

    Similarly, when you run ramka['TEST4'] = ramka['N'] this doesn't make a copy of the lists, but just references them a second time. To really make a copy you would need:

    ramka['TEST4'] = ramka['N'].apply(lambda x: x.copy())
    

    Note that to add a scalar to a numeric column you should not use apply but:

    ramka['TEST'] = ramka['NRW']+3
    

    Output:

            N  NRW  NRW_1  NRW_2  TEST        TEST2        TEST4
    0  [1, 2]    1      2    100     4  [1, 2, 555]  [1, 2, 666]
    1  [1, 2]    1      3    100     4  [1, 2, 555]  [1, 2, 666]
    2  [1, 2]    4      1    100     7  [1, 2, 555]  [1, 2, 666]
    3  [1, 2]    5      6    101     8  [1, 2, 555]  [1, 2, 666]
    4  [1, 2]    7      9    102    10  [1, 2, 555]  [1, 2, 666]
    5  [1, 2]    8      7    102    11  [1, 2, 555]  [1, 2, 666]