Search code examples
pythonpandastranspose

Transpose row values into specific columns in Pandas


I have a df like this:

    MemberID FirstName LastName ClaimID  Amount    
0   1          John      Doe     001A    100
1   1          John      Doe     001B    150
2   2          Andy      Right   004C    170
3   2          Andy      Right   005A    200
4   2          Andy      Right   002B    100   

I need to transpose the values in the 'ClaimID' column for each member into one row, so each member will have each Claim as a value in a separate column called 'Claim(1-MaxNumofClaims), and the same logic goes for the Amount columns, the output needs to look like this:

    MemberID FirstName LastName  Claim1 Claim2  Claim3 Amount1 Amount2 Amount3     
0   1          John      Doe     001A   001B    NaN    100     150     NaN
1   2          Andy      Right   004C   005A    002B   170     200     100    

I am new to Pandas and got myself stuck on this, any help would be greatly appreciated.


Solution

    • the operation you need is not transpose, this swaps row and column indexes
    • this approach groupby() identifying columns and constructs dict of columns that you want values to become columns 1..n
    • part two is expand out these dict. pd.Series expands a series of dict to columns
    df = pd.read_csv(io.StringIO("""    MemberID FirstName LastName ClaimID  Amount    
    0   1          John      Doe     001A    100
    1   1          John      Doe     001B    150
    2   2          Andy      Right   004C    170
    3   2          Andy      Right   005A    200
    4   2          Andy      Right   002B    100   """), sep="\s+")
    
    cols = ["ClaimID","Amount"]
    
    # aggregate to columns that define rows,  generate a dict for other columns
    df = df.groupby(
        ["MemberID","FirstName","LastName"], as_index=False).agg(
        {c:lambda s: {f"{s.name}{i+1}":v for i,v in enumerate(s)} for c in cols})
    
    # expand out the dicts and drop the now redundant columns
    df = df.join(df["ClaimID"].apply(pd.Series)).join(df["Amount"].apply(pd.Series)).drop(columns=cols)
    
    
    MemberID FirstName LastName ClaimID1 ClaimID2 ClaimID3 Amount1 Amount2 Amount3
    0 1 John Doe 001A 001B nan 100 150 nan
    1 2 Andy Right 004C 005A 002B 170 200 100