Search code examples
javaregexjtextpanestyleddocument

Which is the right regular expression to use for Numbers and Strings?


I am trying to create simple IDE and coloring my JTextPane based on

  • Strings (" ")
  • Comments (// and /* */)
  • Keywords (public, int ...)
  • Numbers (integers like 69 and floats like 1.5)

The way i color my source code is by overwritting the insertString and removeString methods inside the StyledDocument.

After much testing, i have completed comments and keywords.

Q1: As for my Strings coloring, I color my strings based on this regular expression:

Pattern strings = Pattern.compile("\"[^\"]*\"");
Matcher matcherS = strings.matcher(text);

while (matcherS.find()) {
    setCharacterAttributes(matcherS.start(), matcherS.end() - matcherS.start(), red, false);
}

This works 99% of the time except for when my string contains a specific kind of string where there is a "\ inside the code. This messes up my whole color coding. Can anyone correct my regular expression to fix my error?

Q2: As for Integers and Decimal coloring, numbers are detected based on this regular expression:

Pattern numbers = Pattern.compile("\\d+");
Matcher matcherN = numbers.matcher(text);
while (matcherN.find()) {
    setCharacterAttributes(matcherN.start(), matcherN.end() - matcherN.start(), magenta, false);
}

By using the regular expression "\d+", I am only handling integers and not floats. Also, integers that are part of another string are matched which is not what i want inside an IDE. Which is the correct expression to use for integer color coding?

Below is a screenshot of the output: enter image description here

Thank you for any help in advance!


Solution

  • For the strings, this is probably the fastest regex -

    "\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""

    Formatted:

     " [^"\\]* 
     (?: \\ . [^"\\]* )*
     "
    

    For integers and decimal numbers, the only foolproof expression I know of is
    this -

    "(?:\\d+(?:\\.\\d*)?|\\.\\d+)"

    Formatted:

     (?:
          \d+ 
          (?: \. \d* )?
       |  \. \d+ 
     )
    

    As a side note, If you're doing each independently from the start of
    the string you could be possibly overlapping highlights.