Search code examples
pythonpandas

How to rename row string based on another row string?


Imagine I have a dataframe like this:

import pandas as pd
df = pd.DataFrame({"a":["","DATE","01-01-2012"],
                   "b":["","ID",18],
                   "c":["CLASS A","GOLF",3],
                   "d":["","HOCKEY",4],
                   "e":["","BASEBALL",2],
                   "f":["CLASS B","GOLF",15],
                   "g":["","HOCKEY",2],
                   "h":["","BASEBALL",3]
                   })

Out[33]: 
            a   b        c       d         e        f       g         h
0                  CLASS A                    CLASS B                  
1        DATE  ID     GOLF  HOCKEY  BASEBALL     GOLF  HOCKEY  BASEBALL
2  01-01-2012  18        3       4         2       15       2         3

I would like to add the strings in the first row to the names of those sports on the row below, but only before the beginning of the next "Class". Does anyone know how can I do that?

So the result should be like this:

            a   b             c  ...             f               g                 h
0                       CLASS A  ...       CLASS B                                  
1        DATE  ID  CLASS A GOLF  ...  CLASS B GOLF  CLASS B HOCKEY  CLASS B BASEBALL
2  01-01-2012  18             3  ...            15               2                 3

Later I will make the row 1 to be my header names, but this part I know how to do. I already tried to use df.iterrows but I got confused with the workflow.


Solution

  • Using replace+ffill to forward the CLASS, and a boolean mask to change the strings by boolean indexing:

    s = df.loc[0].replace('', np.nan).ffill()
    m = s.notna()
    
    df.loc[1, m] = s[m]+' '+df.loc[1, m]
    

    Output:

                a   b             c               d                 e             f               g                 h
    0                       CLASS A                                         CLASS B                                  
    1        DATE  ID  CLASS A GOLF  CLASS A HOCKEY  CLASS A BASEBALL  CLASS B GOLF  CLASS B HOCKEY  CLASS B BASEBALL
    2  01-01-2012  18             3               4                 2            15               2                 3
    

    Side note: it might be better to have those two rows as a MultiIndex rather than rows with strings. This would enable you to benefit from vectorized operations on your numeric data.