Search code examples
pythonpython-3.xregexstringregex-group

How to make a regular expression pattern consider a comma before the start of line when using the ^ operator?


import re

#example 1  with a  ,  before capture group
input_text = "Hello how are you?, dfdfdfd fdfdfdf other text. hghhg"

#example 2 without a  , (or \.|,|;|\n) before capture group
input_text = "dfdfdfd fdfdfdf other text. hghhg"

#No matter what position you place ^ within the options, it always considers it first, ignoring the others.
fears_and_panics_match = re.search(
                                    r"(?:\.|,|;|\n|^)\s*(?:(?:for|by)\s*me|)\s*(.+?)\s*(?:other\s*text)\s*(?:\.|,|;|\n)", 
                                    #r"(?:\.|,|;|\n)\s*(?:(?:for|by)\s*me|)\s*(.+?)\s*(?:other\s*text)\s*(?:\.|,|;|\n|$)", 
                                    input_text, flags = re.IGNORECASE)


if fears_and_panics_match: print(fears_and_panics_match.group(1))

Why do I use this pattern r"(?:\.|,|;|\n|^)\s*(?:(?:for|by)\s*me|)\s*(.+?)\s*(?:other\s*text)\s*(?:\.|,|;|\n)" capture Hello how are you?, dfdfdfd fdfdfdf no matter what position you place the ^. I would need you to evaluate the possibility of finding a comma , and then the one at the beginning of the line ^

Correct output in each case:

#for example 1
"dfdfdfd fdfdfdf"

#for example 2
"dfdfdfd fdfdfdf"

Solution

  • You can change your regex to optionally match some characters up to a ., , or ;; then capture from there until other text:

    ^(?:.*?[.,;])?\s*(?:(?:for|by)\s*me\s*)?(\w.*?)(?=\s*other\s*text)
    

    It matches:

    • ^ beginning of line
    • (?:.*?[.,;])? an optional string of characters finishing with a ., , or ;
    • \s* some spaces
    • (?:(?:for|by)\s*me\s*)? the optional phrase for me or by me
    • (\w.*?) a minimal number of characters, starting with a word character
    • (?=\s*other\s*text) lookahead that asserts the next characters are other text

    Demo on regex101

    In python (note by using re.match we don't need the ^ in the regex):

    strs = [
      'dfdfdfd fdfdfdf other text. hghhg',
      'Hello how are you?, dfdfdfd fdfdfdf other text.hghhg',
      'for me a word other text',
      'A semicolon first; then some words before other text'
    ]
    regex = r'(?:.*?[.,;])?\s*(?:(?:for|by)\s*me\s*)?(\w.*?)(?=\s*other\s*text)'
    for s in strs:
        print(re.match(regex, s).group(1))
    

    Output:

    dfdfdfd fdfdfdf
    dfdfdfd fdfdfdf
    a word
    then some words before