Search code examples
pandasdataframerename

Moving values in pandas df to new column within df, rename column based on str in column


I have a pandas df of data stripped from a pdf. I have already done some considerable wrangling to get it to this point. There are about 100 chemical concentrations at about 10 sites, and in separate rows, a lab note for each of them denoting info from the tests corresponding to each site. I would like to add the unit (various units) into the column name and move the lab note (a,b,c,d,e in example, vary in my data) into the row for its respective location. Please see image below.

current df vs desired df layout

Previously the lab note was in the row directly under its respective lab test - after an outer join with a df of the coordinates of the test sites, it was rearranged as seen in the example I posted.

I am inexperienced with loops, and certainly need a loop solution due to the number of columns in my data.

Thank you.

Attempted to add unit to column name using

df.rename(columns={'chem1': 'chem1'+df.chem1.iloc(index of units row),repeat ad infinitum}, inplace=True)

Do not know where to start with loops for either problem


Solution

  • You could use iloc to separate the rows, then

    StartColumns = df.columns
    newColumns = df.iloc[5:10, 3:]
    NewColumnNames = [StartColumns[i] + "_LQ" for i in range(3, len(StartColumns))]
    newColumns.columns = NewColumnNames
    # Don't change the first 3 column names
    ColNames = [StartColumns[i] if i < 3 else "_".join(StartColumns[i], df.iat[10, i]) for i in range(len(StartColumns))]
    df.columns = ColNames
    FinalDf = pd.concat(df, newColumns, ignore_index = True, axis = 1)
    

    Then to re-order the columns, you'll need to use something like

    AllCols = ColNames[:3]
    for i in range(2 * len(NewColumnNames)):
        # Integer division truncates, so we can take the first relevant index 
        # from our renamed columns, then the first from our new columns
        if i % 2 == 0:
            AllCols.append(ColNames[3 + int(i/2)])
        else:
            AllCols.append(NewColumnNames[int(i/2)])
    FinalDf = FinalDf[AllCols]
    

    This code will not be the fastest, but it should do what you're looking for.