Search code examples
python-3.xregexregex-group

Creating a Python Regex to match a string


I am having a hard time creating a regex for this string. I need to:

  1. extract the words after Property, until &
  2. extract the words after Category, until &
  3. Create a regex to match everything from "cat" until the , before "modifiedBy"
"cat":"Property : TikTok Videos & Category : Insta Videos & User Impact: TBD & User Minutes :
18","modifiedBy"

My current regex is:

"cat":"Property : (?P<property>\w+.*?) & Category : (?P<category>\w+)?

  1. This is able to name "property" correctly as "TikTok Videos".

  2. But the named "Category" regex comes up as just the word "Insta". If I add a + as in (?P\w+, then it ends up consuming all the way until the end of the string.

  3. As far as consuming the entire string from "cat" until the last comma before the "modified", I don't know how to capture that.

So the end product would be:

  1. property = TIkTok Videos
  2. Category = Insta Videos
  3. Entire_string = "cat":"Property : TikTok Videos & Category : Insta Videos & User Impact: TBD & User Minutes : 18"

Solution

  • You can do it all with a single regex using look ahead assertions.

    r'(?s)^(?=.*?Property\s*:\s*(?P<Property>[^&]*?)\s*&)(?=.*?Category\s*:\s*(?P<Catggory>[^&]*?)\s*&)(?=.*?(?P<cat>"cat".*?"),\s*"modifiedBy")'
    

    https://regex101.com/r/gdM2q1/1

    Expanded / formatted

    (?s)
    ^
    (?=
       .*? Property \s* : \s* 
       (?P<Property> [^&]*? )        # (1)
       \s* &
    )
    (?=
       .*? Category \s* : \s* 
       (?P<Catggory> [^&]*? )        # (2)
       \s* &
    )
    (?=
       .*? 
       (?P<cat> "cat" .*? " )        # (3)
       , \s* "modifiedBy"
    )
    

    If you need to consume the "cat" text use this.
    You'd do this to move the current position past the last set of Category and Property text,
    (although not guaranteed). And you'll need to add the m multi-line modifier (?sm) to it as well.

    r'(?sm)^(?=.*?Property\s*:\s*(?P<Property>[^&]*?)\s*&)(?=.*?Category\s*:\s*(?P<Catggory>[^&]*?)\s*&).*?(?P<cat>"cat".*?"),\s*"modifiedBy"'
    

    https://regex101.com/r/tZEm5K/1