Search code examples
pythonregex

Extract with multiple Patterns


Having an issue that maybe some help me with. I am trying to extract two patterns from a string and place them in another column. It's extracting the first string fine but I am missing some in getting the second one there. Here's the string.

jobseries['New Column'] = jobseries['Occupation'].str.extract('(GS-\d+)(|)(WG-\d+)').fillna('')

The first string is (GS-\d+) and the second string is (WG-\d+)

I've tried a ton of variations none have worked.


Solution

  • You can use either

    jobseries['New Column'] = jobseries['Occupation'].str.extract(r'(GS-\d+|WG-\d+)').fillna('')
    

    or a shorter

    jobseries['New Column'] = jobseries['Occupation'].str.extract(r'((?:GS|WG-\d+)').fillna('')
    

    The points are:

    • There must be only one capturing group in the regex since you are using Series.str.extract and assignt he result to a single column (New Column)
    • The regex must match either one string or the other, but you can factor in the beginning of the pattern and simply use ((?:GS|WG-\d+) instead of (GS-\d+|WG-\d+), that means a capturing group that matches either GS or WG and then a hyphen and then one or more digits.