Search code examples
pythonpandasglob

Iterate through columns and rename according to a rule


In the following snippet of code, I am trying to rename some columns (containing Hosted Meetings) to Hosted Meetings [date]. This is what happens when I print all the i's. However, it does not save this to df.

all_users_sheets_hosts = []

for f in glob.glob("./users-export-*.xlsx"):
    df = pd.read_excel(f)
    all_users_sheets_hosts.append(df)
    j = re.search('(\d+)', f)
    for i in df.columns.values:
        if 'Hosted Meetings' in i:
            i = ('Hosted Meetings' + ' ' + j.group(1))

Solution

  • The iterator i is a copy of the array value, not a pointer. One way to fix this would be to enumerate through the index rather than the values themselves:

    for i, val in enumerate(df.columns.values):
        if 'Hosted Meetings' in val:
            df.columns[i] = ('Hosted Meetings' + ' ' + j.group(1))
    

    However in any case it's a good illustration of the advantages of a more functional style. In your case, you can treat the columns as a pandas Series of type str, and hence you can use the vectorised replace on it, to rename your columns in one statement:

    df.columns = df.columns.str.replace('.*Hosted Meeings.*', 
                                        'Hosted Meetings' + ' ' + j.group(1))