Search code examples
pythonpandasdataframedata-scienceseries

Generating 3 columns from one with .apply on dataframe


I want to extract some data from each row, and make that new columns of existing or new dataframe, without repeatedly doing the same operation of re. match.

Here's how one entry of the dataframe looks:

00:00 Someones_name: some text goes here

And i have a regex that successfully takes 3 groups that I need:

re.match(r"^(\d{2}:\d{2}) (.*): (.*)$", x)

The problem I have is, how to take matched_part[1], [2], and [3] without actually matching for every new column again.

The solution that I don't want is:

new_df['time'] = old_df['text'].apply(function1)`
new_df['name'] = old_df['text'].apply(function2)`
new_df['text'] = old_df['text'].apply(function3)`

def function1(x):
  return re.match(r"^(\d{2}:\d{2}) (.*): (.*)$", x)[1]

Solution

  • you can use str.extract with your pattern

    df[['time','name', 'text']] = df['col1'].str.extract(r"^(\d{2}:\d{2}) (.*): (.*)$")
    print(df)
    #                                        col1   time           name  \
    # 0  00:00 Someones_name: some text goes here  00:00  Someones_name   
    
    #                   text  
    # 0  some text goes here