I have a df like this:
MemberID FirstName LastName ClaimID Amount
0 1 John Doe 001A 100
1 1 John Doe 001B 150
2 2 Andy Right 004C 170
3 2 Andy Right 005A 200
4 2 Andy Right 002B 100
I need to transpose the values in the 'ClaimID' column for each member into one row, so each member will have each Claim as a value in a separate column called 'Claim(1-MaxNumofClaims), and the same logic goes for the Amount columns, the output needs to look like this:
MemberID FirstName LastName Claim1 Claim2 Claim3 Amount1 Amount2 Amount3
0 1 John Doe 001A 001B NaN 100 150 NaN
1 2 Andy Right 004C 005A 002B 170 200 100
I am new to Pandas and got myself stuck on this, any help would be greatly appreciated.
groupby()
identifying columns and constructs dict
of columns that you want values to become columns 1..ndict
. pd.Series
expands a series of dict
to columnsdf = pd.read_csv(io.StringIO(""" MemberID FirstName LastName ClaimID Amount
0 1 John Doe 001A 100
1 1 John Doe 001B 150
2 2 Andy Right 004C 170
3 2 Andy Right 005A 200
4 2 Andy Right 002B 100 """), sep="\s+")
cols = ["ClaimID","Amount"]
# aggregate to columns that define rows, generate a dict for other columns
df = df.groupby(
["MemberID","FirstName","LastName"], as_index=False).agg(
{c:lambda s: {f"{s.name}{i+1}":v for i,v in enumerate(s)} for c in cols})
# expand out the dicts and drop the now redundant columns
df = df.join(df["ClaimID"].apply(pd.Series)).join(df["Amount"].apply(pd.Series)).drop(columns=cols)
MemberID | FirstName | LastName | ClaimID1 | ClaimID2 | ClaimID3 | Amount1 | Amount2 | Amount3 | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | John | Doe | 001A | 001B | nan | 100 | 150 | nan |
1 | 2 | Andy | Right | 004C | 005A | 002B | 170 | 200 | 100 |