I working with a data frame where I several columns, but I'm focusing on just two.
Room | Task |
---|---|
Room A closed. something | nan |
Room B closed. something2 | nan |
Room C | Task 1 |
Some rows in one of the columns have nan values. I want to select those columns, and split the string in the first column (based on the word 'closed') and place it in the place of the second column (instead of the current nan value). Here the example of what I would like:
Room | Task |
---|---|
Room A | closed. something |
Room B | closed. something2 |
Room C | Task 1 |
I have managed to do it in isolation: first choose the rows that have a nan value and then split them, but I lose everything else in the original df (other rows and columns that I have):
df = pd.load_csv("data.csv")
dd = df[df.Room.str.contains('closed',case=False)]
dd = dd.Room.str.split("(?=closed.)", expand = True)
How could I do this but mantain the original df data?
You can use a regex with a lookahead to split
, then combine_first
:
out = (df['Room'].str.split(r'\s*(?=closed)', n=1, expand=True)
.rename(columns={0: 'Room', 1: 'Task'})
.combine_first(df)
)
Output:
Room Task
0 Room A closed. something
1 Room B closed. something2
2 Room C Task 1