Search code examples
pythonregexpython-refindall

Match smallest possible sentence


Text:

One sentence here, much wow. Another one here. This is O.N.E. example n. 1, a nice one to understand. Hope it's clear now!

Regex: (?<=\.\s)[A-Z].+?nice one.+?\.(?=\s[A-Z])

Result: Another one here. This is O.N.E. example n. 1, a nice one to understand.

How can I do to obtain This is O.N.E. example among n. 1, a nice one to understand.? (i.e. the smallest possible sentence that matches the regex)


Solution

  • You could exclude matching a dot, and only match the dot incase of an uppercase char followed by a dot, or a dot followed by a space and digit.

    (?:(?<=\.\s)|^)[A-Z][^.A-Z]*(?:(?:[A-Z]\.|\.\s\d)[^.A-Z]*)*\bnice one\b.+?(?=\s[A-Z])
    
    • (?:(?<=\.\s)|^) Assert a . and whitespace char to the left or the start of the string
    • [A-Z][^.A-Z]* Match an uppercase char A-Z and 0+ times any char except a dot or uppercase char
    • (?: Non capture group
      • (?:[A-Z]\.|\.\s\d) Match either A-Z and . or match . whitespace char and digit
      • [^.A-Z]* Optionally match any char except a . or uppercase char
    • )* Close group and optionally repeat
    • \bnice one\b.+?(?=\s[A-Z]) Match nice one and match until asserting a whitspace char and uppercase char to the right

    Regex demo