Search code examples
regexnotepad++

Select numbers smaller than 10000 using regex in notepad++


I have a very long list that following is a part of it:

English

4,000,000

AfricanAmerican

1,771,000

Irish

1,100,000

Dutch

800,000

German

700,000

Scotch-Irish

0-640,000

NativeAmerican

=0=500,000

Scottish

450,000

Russian

15,000

Austrian

5000

Finnish4000

Swiss

+4000

1820

Now I want to select all comma separated numbers and numbers smaller than 10000 using regex in notepad++

Following regex is good for comma separated numbers but it can't select numbers smaller than 10000:

\d{1,3},\d{3}

how to modify it to select numbers smaller than 10000 too?

note that regex must select numbers that combined with words too, like Finnish4000


Solution

  • You might use an alternation, and rule out 1-4 digits numbers that start with a zero.

    To prevent partial matches, you can assert no digits to the left and the right.

    (?<!\d)(?:\d{1,3}(?:,\d{3})+|(?!0)\d{1,4})(?!\d)
    

    The regex matches:

    • (?<!\d) Assert not a digit to the left
    • (?: Non capture group for the alternatives
      • \d{1,3} Match 1-3 digits
      • (?:,\d{3})+ Repeat 1+ times , and 3 digits
      • | Or
      • (?!0)\d{1,4} Negative lookahead, assert not a zero to the right and match 1-4 digits
    • ) Close the non capture group
    • (?!\d) Assert not a digit to the right

    See a regex demo