Search code examples
pythonregexextract

Extract subtring after certain text and before hyphen (regex)


I have a list of emails that match a similar pattern as such:

[email protected]

[email protected]

The first email has 5 parts while the second one has 4 parts (marked by the hyphen) before the @mail.com

I need to extract the group_code that comes after the nonprod/prod portion of the group email.

For example for [email protected] i need to extract red,

and for [email protected] i need to extract blue.

The portion before the group code will always be prod or nonprod, further more there will always be the subtring "prod-" before the group code.

How can I go about extracting the group code from emails that have different amount of parts to always get the group code?


Solution

  • re.findall('(?:prod-)(.*)-', s)
    
    df['group'] = df['col2'].str.extract('(?:prod-)(.*)-' )
    df
    
    
        col1    col2                                    group
    0   1       [email protected]    red
    1   2       [email protected]            blue
    2   3                                               NaN