Search code examples
regexregex-lookaroundsregex-greedy

Ignore empty lines [\t\s] spaces or tabs


So I have this regex:

https://regex101.com/r/Puggjm/5

And I am basically, trying to ignore all the line numbers followed by space or nothing. My current regex: ^[\d\s].+(?:[A-Z\s]*)*$

The line numbers followed by nothing are actually not ignored.


Solution

  • You might use a negative lookahead to assert that what follows is not 1+ digits followed by 0+ times a whitespace character:

    ^(?!\d+\s*$)\d+.+$
    
    • ^ Start of the string
    • (?!\d+\s*$) Negative lookahead to assert what is on the right is not 1+ digits followed by 0+ times a whitespace character and the end of the string
    • \d+.+ Match 1+ times a digit and 1+ times any character
    • $ End of the string

    See the regex demo | Python demo

    Example using findall:

    import re
    regex = r"^(?!\d+\s*$)\d+.+$"
    test_str = ("Here goes some text. {tag} A wonderful day. It's soon cristmas.\n"
        "2 Happy 2019, soon. {Some useful tag!} Something else goes here.\n"
        "3 Happy ending. Yeppe! See you.\n"
        "4\n"
        "5 Happy KKK!\n"
        "6 Happy B-Day!\n"
        "7\n"
        "8 Universe is cool!\n"
        "9\n"
        "10 {Tagish}.\n"
        "11\n"
        "12 {Slugish}. Here goes another line. {Slugish} since this is a new sentence.\n"
        "13\n"
        "14 endline.")
    print(re.findall(regex, test_str, re.MULTILINE));
    

    When there is a dot after the digit, you could use:

    ^(?!\d+\.\s*$)\d+.+$