Search code examples
regexsolrsolrnet

This Regex is not working only in Solr


This Regex is working perfectly in plain C# console application. Based on this we have started using SolrNet. Trying to query a Solr instance for a field by using the same regex, throwing exceptions as shown below

java.lang.IllegalArgumentException: expected ']' at position 70 at org.apache.lucene.util.automaton.RegExp.parseCharClassExp(RegExp.java:1087)

Solution

  • You are using Lucene regex engine that is different from the .NET regex engine.

    A hyphen is a range operator when it is unescaped even at the end of the character class in a Lucene pattern. So, either escape the hyphen or move to the character class start, i.e. [a-zA-Z'-] => [-a-zA-Z'] and [^a-zA-Z'-] => [^-a-zA-Z'].

    It does not look like Lucene regex supports non-capturing groups, so remove all ?: from the pattern.

    So, it will look like

    ([-a-zA-Z']+[^-a-zA-Z']+){0,5}the([^-a-zA-Z']+[-a-zA-Z']+){0,5}([-a-zA-Z']+[^-a-zA-Z']+){0,5}the([^-a-zA-Z']+[-a-zA-Z']+){0,5}