When System appears in column "Type" I want to delete all values from that row, except the value from the column "Name". When Hardware appears in column "Type" I want to delete all values from that row except the value from column "Color". After that, I want to split all cells from the column "Text" which are not empty into multiple rows, and to keep rows which are empty from that column.
Here is data frame that I have:
df
Type Text Name ID Color
System aca\nmaca\nstream\nphase\n Gary 123 Red
System aca\nmaca\nstream\nphase\n Mary 3254 Yellow
Hardware a\nmaca\nstream\nphase\n Jerry 158 White
Software ca\nmaca\nstream\nphase\n Perry 56414 Green
Software aca\nmac\nstream\nphase\n Jimmy 548 Blue
System aca\nmaca\nstream\nphase\n Marc 5658 Black
System aca\nmaca\nstram\npha\n John 867 Pink
Hardware aca\nma\nstream\nphase\n Sam 665 Gray
Hardware aca\nmaca\nstream\nphase\n Jury 5784 Azure
System aca\nmaca\nstream\nphase\n Larry 5589 Fawn
Software aca\nmaca\nst\nphase\n James 6568 Magenta
System aca\nmaca\nstream\nph\n Kevin 568 Cyan
And here is the desired result:
Type Text Name ID Color
System Gary
System Mary
Hardware White
Software ca Perry 56414 Green
Software maca Perry 56414 Green
Software stream Perry 56414 Green
Software phase Perry 56414 Green
Software aca Jimmy 548 Blue
Software mac Jimmy 548 Blue
Software stream Jimmy 548 Blue
Software phase Jimmy 548 Blue
System Marc
System John
Hardware Gray
Hardware Azure
System Larry
Software aca James 6568 Magenta
Software maca James 6568 Magenta
Software st James 6568 Magenta
Software phase James 6568 Magenta
System Kevin
For split cells to multiple rows I tried this function:
def SepInRows(df, c):
s = df[c].str.split('\n', expand=True).stack()
i = s.index.get_level_values(0)
df2 = df.loc[i].copy()
df2[c] = s.values
return df2
But it drops rows with empty values in the column "Text" which is not that I want.
How to solve this?
You can use mask
with difference
in preprocessing and then this solution:
c1 = df.columns.difference(['Type','Name'])
c2 = df.columns.difference(['Type','Color'])
df[c1] = df[c1].mask(df['Type'] == 'System', np.nan)
df[c2] = df[c2].mask(df['Type'] == 'Hardware', np.nan)
cols = df.columns
df1 = (df.join(df.pop('Text').str.split('\n', expand=True)
.stack()
.reset_index(level=1, drop=True)
.rename('Text'))
).reset_index(drop=True).reindex(columns=cols)
print (df1)
Type Text Name ID Color
0 System NaN Gary NaN NaN
1 System NaN Mary NaN NaN
2 Hardware NaN NaN NaN White
3 Software ca Perry 56414.0 Green
4 Software maca Perry 56414.0 Green
5 Software stream Perry 56414.0 Green
6 Software phase Perry 56414.0 Green
7 Software Perry 56414.0 Green
8 Software aca Jimmy 548.0 Blue
9 Software mac Jimmy 548.0 Blue
10 Software stream Jimmy 548.0 Blue
11 Software phase Jimmy 548.0 Blue
12 Software Jimmy 548.0 Blue
13 System NaN Marc NaN NaN
14 System NaN John NaN NaN
15 Hardware NaN NaN NaN Gray
16 Hardware NaN NaN NaN Azure
17 System NaN Larry NaN NaN
18 Software aca James 6568.0 Magenta
19 Software maca James 6568.0 Magenta
20 Software st James 6568.0 Magenta
21 Software phase James 6568.0 Magenta
22 Software James 6568.0 Magenta
23 System NaN Kevin NaN NaN