Search code examples
pythonpandasdataframecalculated-columnsnaming

Creating custom names for columns based on other column names in pandas dataframe


I have a dataframe like below:

enter image description here

I am looking to create a column using difference or use any other calculations among columns. However, I looking to name the column so that it relfects the operation done. For ex below I am finding the difference b/w Origin 1 and Dest 1 as below:

enter image description here

How do I create those custom naming of columns as highlighted and especially when I have to create multiple such columns.


Solution

  • Just iterate through it and for naming you can use a f-string

    for col_a in df.columns:
       for col_b in df.columns:
          if col_a != col_b:
             df[f'{col_a} - {col_b}'] = df[col_a] - df[col_b]
    

    if you use itertools (pre-installed in python) you can make it easier to read (as proposed by @MustafaAydın):

    import itertools
    for col_a, col_b in itertools.permutations(df, 2):
        df[f'{col_a} - {col_b}'] = df[col_a] - df[col_b]
    

    if you want to do multiple operations just add a line

    import itertools
    for col_a, col_b in itertools.permutations(df, 2):
        df[f'{col_a} - {col_b}'] = df[col_a] - df[col_b]
        df[f'{col_a} + {col_b}'] = df[col_a] + df[col_b]
    

    if you only want to use subsets of columns, e.g. only from origin to destination you can do:

    import itertools
    origins = [col for col in df.columns if col.startswith('Origin')]
    destinations = [col for col in df.columns if col.startswith('Dest')]
    for col_a, col_b in itertools.product(origins, destinations):
        df[f'{col_a} - {col_b}'] = df[col_a] - df[col_b]
        df[f'{col_a} + {col_b}'] = df[col_a] + df[col_b]