Search code examples
regexunicodeautomaton

Can I use Unicodes in regex engine dk.brics.automaton?


I want to use Unicodes in my regular expressions.

For example, RegExp="\u0061" matches "a". But it seems dk.brics.automaton does not support this. It turned out matching "u0061". I also tried RegExp="\u0061" and RegExp="\\u0061". None would work.

If you have any experience with this tool, could you please give me some solution ?

Thanks!


Solution

  • Finally, I found a way to circumvent this issue.

    First, we can use Unicodes in the Java code, but it has to be created individually. E.g. String str = "\u0061"+"b"; While String str = "\u0061b"; does not work well.

    Second, if we want read the strings from a text file, like test.txt containing "\u0061b\u0063", we have to (as far as I know) replace the Unicodes with corresponding symbols manually, because they are mixed. Then we can get String str with the value "abc".