Search code examples
pythonpandasfor-loopseries

For loop on pandas Series


I'm trying to implement code that includes a for loop on a list of pandas Series:

a = pd.Series(dtype= 'float64')
b = pd.Series(dtype= 'float64')
c = pd.Series(dtype= 'float64')


data = [a,b,c]

for k in data:
    if len(k) < 1:
        k = k.append(pd.Series([1,2]))
        
print(a)


# returns: Series([], dtype: float64) 

I thought print(a) would return the appended Series containing [1,2]. But it just returned the original empty Series I initially defined. Of course, printing b and c also returned an empty Series. Meanwhile, if I directly do the append instead of using the for loop:

a = pd.Series(dtype= 'float64')
a = a.append(pd.Series([1,2]))

print(a)

# returns: 

0    1
1    2
dtype: int64

So the latter method returns the appended Series just fine. Why doesn't this work for the former method? I expected the first method to return the same result as the one in the second method. I thought the local characteristic only applied to defining functions, not for loops. Am I missing something here?


Solution

  • When you use append (and almost all methods of Pandas), you get a "copy" of the Series. The modification is not in place.

    Debug with id:

    print(f"a: {id(a)}, {id(data[0])}")
    
    for k in data:
        print(f"k before: {id(k)}")
        if len(k) < 1:
            k = k.append(pd.Series([1,2]))
            print(f"k after: {id(k)}")
        break
    

    Output:

    a: 140711220977520, 140711220977520
    k before: 140711220977520
    k after: 140710157071840  # <- not data[0], not 'a'
    

    Update

    One last question, then would the for loop work fine in terms of memory allocation for other objects (besides pandas objects such as Series) where the method doesn't create a new copy of the original variable?

    If data contains mutable objects like list, it works:

    a = []
    b = []
    c = []
    data = [a, b, c]
    
    for k in data:
        if len(k) < 1:
            k.append([1, 2])
    

    Output:

    >>> a
    [[1, 2]]