Search code examples
pythonpandasstringsplitdelimiter

Split string in a data frame column into another existing column with nan values


I working with a data frame where I several columns, but I'm focusing on just two.

Room Task
Room A closed. something nan
Room B closed. something2 nan
Room C Task 1

Some rows in one of the columns have nan values. I want to select those columns, and split the string in the first column (based on the word 'closed') and place it in the place of the second column (instead of the current nan value). Here the example of what I would like:

Room Task
Room A closed. something
Room B closed. something2
Room C Task 1

I have managed to do it in isolation: first choose the rows that have a nan value and then split them, but I lose everything else in the original df (other rows and columns that I have):

df = pd.load_csv("data.csv")
dd = df[df.Room.str.contains('closed',case=False)]
dd = dd.Room.str.split("(?=closed.)", expand = True)

How could I do this but mantain the original df data?


Solution

  • You can use a regex with a lookahead to split, then combine_first:

    out = (df['Room'].str.split(r'\s*(?=closed)', n=1, expand=True)
           .rename(columns={0: 'Room', 1: 'Task'})
           .combine_first(df)
          )
    

    Output:

         Room                Task
    0  Room A   closed. something
    1  Room B  closed. something2
    2  Room C              Task 1
    

    regex demo