Search code examples
pythonpandasdataframefor-loopconcatenation

Python/Pandas. For loop on multiple dataFrames not working correctly


I am trying to process a list of dataframes (example shows 2, reality has much more) in multiple ways using a for loop. Droping columns in the dataframe referenced in the loop works fine, however, concat doesn't do anything inside the loop. I expect to update the original dataframe referenced in dfs.

UPDATED PROBLEM STATEMENT

Previous examples do not cover this case/ seem to not work. Example adapted from here: pandas dataframe concat using for loop not working

Minifying the example leads to the following (code partially borrowed from another question)

import numpy as np
import pandas as pd


data = [['Alex',10],['Bob',12],['Clarke',13]]
data2 = ['m','m','x']
A = pd.DataFrame(data, columns=['Name','Age'])
B = pd.DataFrame(data, columns=['Name','Age'])
C = pd.DataFrame(data2, columns=['Gender'])

#expected result for A:
Anew=pd.DataFrame([['Alex','m'],['Bob','m'],['Clarke','x']], columns=['Name', 'Gender'])

dfs = [A,B]

for k, v in enumerate(dfs):
    # The following line works as expected on A an B respectively, inplace is required to actually modify A,B as defined above
    dfs[k]=v.drop('Age',axis=1, inplace=True)
    # The following line doesn't do anything, I was expecting Anew (see above) 
    dfs[k] = pd.concat([v, C], axis=1)
    # The following line prints the expected result within the loop
    print(dfs[k])

# This just shows A, not Anew: To me tha tmeans A was never updated with dfs[k] as I thought it would. 
print(A)

Solution

  • Update

    Try:

    data = [['Alex',10],['Bob',12],['Clarke',13]]
    data2 = ['m','m','x']
    A = pd.DataFrame(data, columns=['Name','Age'])
    B = pd.DataFrame(data, columns=['Name','Age'])
    C = pd.DataFrame(data2, columns=['Gender'])
    Anew = pd.DataFrame([['Alex','m'],['Bob','m'],['Clarke','x']], columns=['Name', 'Gender'])
    
    dfs = [A, B]
    for v in dfs:
        v.drop('Age', axis=1, inplace=True)
        v['Gender'] = C
    print(A)
    print(Anew)
    

    Output:

    >>> A
         Name Gender
    0    Alex      m
    1     Bob      m
    2  Clarke      x
    
    >>> Anew
         Name Gender
    0    Alex      m
    1     Bob      m
    2  Clarke      x
    

    If you use inplace=True, Pandas doesn't return a DataFrame so dfs is now None:

    dfs[k]=v.drop('Age', axis=1, inplace=True)  # <- Remove inplace=True
    

    Try:

    dfs = [A, B]
    for k, v in enumerate(dfs):
        dfs[k] = v.drop('Age', axis=1)
        dfs[k] = pd.concat([v, C], axis=1)
    out = pd.concat([A, C], axis=1)
    

    Output:

    >>> out
         Name  Age Gender
    0    Alex   10      m
    1     Bob   12      m
    2  Clarke   13      x