Search code examples
pythonpandasreferenceiterationin-place

How to modify a Series(DataFrame) of Pandas in place during iterating?


I need to revice values in a Series(column) of Pandas according to another function.

During iterating, after I get the result, I don't want to lookup the series twice, becasue I guess that it wastes time and is not required.

For example:

import pandas as pd
s = pd.Series(['A', 'B', 'C'])
for index, value in s.items():
    s[index] = func_hard_to_vectorized(value)    # lookup again!!!

In words of C++, "How to get a reference to that cell?"

What I want looks like:

import pandas as pd
s = pd.Series(['A', 'B', 'C'])
for index, value in s.items():
    value = func_hard_to_vectorized(value)    # change in place
    assert_equal(s[index], value)

A same problem about DataFrame exists also, perhaps more heavily influence the performance.

How to get a reference to a row of Pandas.DataFrame?


Solution

  • You can try to insert your data only once, not at each step:

    s[:] = [func_hard_to_vectorized(v) for v in s]
    

    Or:

    s[:] = s.apply(func_hard_to_vectorized)
    

    Thus insertion will only occur once with all items at once.

    If you don't care having a new Series (i.e. if there is not another name pointing to the Series):

    s = s.apply(func_hard_to_vectorized)
    

    can also be used.

    example using both index/value:

    s = pd.Series(['A', 'B', 'C'])
    
    def f(idx, v):
        return f'{v}_{idx}'
    
    s[:] = [f(idx, v) for idx, v in s.items()]
    

    Modified s:

    0    A_0
    1    B_1
    2    C_2
    dtype: object