I'm trying to run a regex query on a solr solr.TextField
field. Is this mean to be supported on that field type?
For example, I'm searching curl -g 'http://localhost:8983/solr/shard/select?rows=0&q=body:/hello/'
which returns > 0 results.
But when I switch it to curl -g 'http://localhost:8983/solr/shard/select?rows=0&q=body:/h[aeiou]llo/'
i get 0 results?
<fieldType name="body_text" class="solr.TextField" positionIncrementGap="100" multiValued="false">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9_@-]+" replacement=" "/>
<tokenizer class="solr.WhitespaceTokenizerFactory" rule="java" />
<filter class="solr.LengthFilterFactory" min="2" max="45"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
</analyzer>
</fieldType>
<field name="body" type="body_text" uninvertible="true" indexed="true" stored="false"/>
When I add debugQuery=true
, I see that my charFilter replacement is not allowing regex characters through:
"debug":{
"rawquerystring":"body:/h[aeiou]llo/",
"querystring":"body:/h[aeiou]llo/",
"parsedquery":"RegexpQuery(body:/h aeiou llo/)",
"parsedquery_toString":"body:/h aeiou llo/",
"explain":{},
"QParser":"LuceneQParser",
The PatterReplaceCharFilterFactory is removing all special characters, matching your pattern, from the regex. Therefore the "[" and "]" are removed from the query and you are seeing zero documents found. The query h[aeiou]llo
becomes h aeiou llo
.
A way to keep both your pattern replace and regex is using the PatternReplaceFilterFactory. Therefore:
<fieldType name="body_text" class="solr.TextField" positionIncrementGap="100" multiValued="false">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory" rule="java" />
<filter class="solr.PatternReplaceFilterFactory" pattern="[^a-zA-Z0-9_@-]+" replacement=" "/>
<filter class="solr.LengthFilterFactory" min="2" max="45"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
</analyzer>
</fieldType>
Just check if this works for your use-case.