This Regex is working perfectly in plain C# console application. Based on this we have started using SolrNet. Trying to query a Solr instance for a field by using the same regex, throwing exceptions as shown below
java.lang.IllegalArgumentException: expected ']' at position 70 at org.apache.lucene.util.automaton.RegExp.parseCharClassExp(RegExp.java:1087)
You are using Lucene regex engine that is different from the .NET regex engine.
A hyphen is a range operator when it is unescaped even at the end of the character class in a Lucene pattern. So, either escape the hyphen or move to the character class start, i.e. [a-zA-Z'-]
=> [-a-zA-Z']
and [^a-zA-Z'-]
=> [^-a-zA-Z']
.
It does not look like Lucene regex supports non-capturing groups, so remove all ?:
from the pattern.
So, it will look like
([-a-zA-Z']+[^-a-zA-Z']+){0,5}the([^-a-zA-Z']+[-a-zA-Z']+){0,5}([-a-zA-Z']+[^-a-zA-Z']+){0,5}the([^-a-zA-Z']+[-a-zA-Z']+){0,5}