I have .xlsx sheet with multiple entries: Entries
what I try to achieve: result
Therefore I am really stuck. I have tried df.drop_duplicates()
but it does not help.
If you could share some ideas or where to look it would be awesome!
TIA
P.S. sorry for the pictures, but it does not render correctly for some reason.
My strategy was to create and then join 2 dataframes with the index set to the Trigger
column. One dataframe contains only the Trigger
, To
, and From
columns, while the other dataframe has the Trigger
, Category
, and Description
columns. The "trick" here is to use DataFrame.pivot to get the corresponding Category
and Description
values into their own columns.
import pandas as pd
import os
filename = r'logfile.xlsx'
new_filename = r'consolidated_logfile.xlsx'
df = pd.read_excel(filename)
# Create a dataframe with the actual records we want to be left with at the end
# You might think of this as removing duplicates, in a way
records = df.loc[df['String Value'].isna()]
# Remove the 'String Value' and 'Remarks' columns
records = records.drop(columns=['String Value', 'Remarks'])
# Set the index to the 'Trigger' column, so another dataframe with the extra
# details ('Category' and 'Description' columns) can be joined/merged later on
records = records.set_index('Trigger')
# Create a dataframe with just the extra details ('Category' and 'Description' columns)
details = df.loc[df['String Value'].notna()]
# Transform the dataframe using DataFrame.pivot so that:
# - the indexes are the values from the 'Trigger' column
# - the columns are the values from the 'String Value' column
# - the values are taken from the 'Remarks' column
details = details.pivot(index='Trigger', columns='String Value', values='Remarks')
# Combine the 2 dataframes with DataFrame.join
combined = records.join(details)
# Save to a new spreadsheet
combined.to_excel(new_filename)
# Launch the new spreadsheet
os.startfile(new_filename)
To make it easier to understand, here are the dataframes after each step:
df = pd.read_excel(filename)
Trigger From To String Value Remarks
0 incident: INC111111111 Tom Peter NaN NaN
1 request: REQ123464 John Alex NaN NaN
2 incident: INC22222222 David Maria NaN NaN
3 incident: INC111111111 NaN NaN Category FYI
4 request: REQ123464 NaN NaN Category Wrong Route
5 incident: INC22222222 NaN NaN Category FYI
6 incident: INC111111111 NaN NaN Description Wrongly routed
7 request: REQ123464 NaN NaN Description Dispatch to another team
8 incident: INC22222222 NaN NaN Description Try more troubleshoot
# Create a dataframe with the actual records we want to be left with at the end
# You might think of this as removing duplicates, in a way
records = df.loc[df['String Value'].isna()]
Trigger From To String Value Remarks
0 incident: INC111111111 Tom Peter NaN NaN
1 request: REQ123464 John Alex NaN NaN
2 incident: INC22222222 David Maria NaN NaN
# Remove the 'String Value' and 'Remarks' columns
records = records.drop(columns=['String Value', 'Remarks'])
Trigger From To
0 incident: INC111111111 Tom Peter
1 request: REQ123464 John Alex
2 incident: INC22222222 David Maria
# Set the index to the 'Trigger' column, so another dataframe with the extra
# details ('Category' and 'Description' columns) can be joined/merged later on
records = records.set_index('Trigger')
From To
Trigger
incident: INC111111111 Tom Peter
request: REQ123464 John Alex
incident: INC22222222 David Maria
# Create a dataframe with just the extra details ('Category' and 'Description' columns)
details = df.loc[df['String Value'].notna()]
Trigger From To String Value Remarks
3 incident: INC111111111 NaN NaN Category FYI
4 request: REQ123464 NaN NaN Category Wrong Route
5 incident: INC22222222 NaN NaN Category FYI
6 incident: INC111111111 NaN NaN Description Wrongly routed
7 request: REQ123464 NaN NaN Description Dispatch to another team
8 incident: INC22222222 NaN NaN Description Try more troubleshoot
# Transform the dataframe using DataFrame.pivot so that:
# - the indexes are the values from the 'Trigger' column
# - the columns are the values from the 'String Value' column
# - the values are taken from the 'Remarks' column
details = details.pivot(index='Trigger', columns='String Value', values='Remarks')
String Value Category Description
Trigger
incident: INC111111111 FYI Wrongly routed
incident: INC22222222 FYI Try more troubleshoot
request: REQ123464 Wrong Route Dispatch to another team
# Combine the 2 dataframes with DataFrame.join
combined = records.join(details)
From To Category Description
Trigger
incident: INC111111111 Tom Peter FYI Wrongly routed
request: REQ123464 John Alex Wrong Route Dispatch to another team
incident: INC22222222 David Maria FYI Try more troubleshoot