Solr query has some special chars that need to be escaped, +-&|!(){}[]^"~*?:/.
SolrJ provides a utility method ClientUtils::escapeQueryChars which escapes more chars, including ;
and white spaces
.
It caused a bug in my application when the search term contains space, like foo bar
, which was turned into foo\ bar
by ClientUtils::escapeQueryChars
. My solution is to split the search term, escape each term and joining them with AND
or OR
.
But it's still a pain to write extra code just to handle handle space.
Is there any special reason that space
and ;
are also escaped by this utility method ?
In Solr (and Lucene) the characters can have different meanings in query syntax depending from what query parser you're using (for example standard, dismax, edismax, etc.).
So when and what escape depends from which query parser you're using and which query you're trying to do. I know this seems too broad as answer but I'll add an example to make the things more clear.
For example, let's try to use edismax
as query parser and have a document with a field named tv_display
of type string
.
If you write:
http://localhost:8983/solr/buybox/select?q=tv_display:Full HD
edismax will convert the query in +tv_display:Full +tv_display:HD
.
In this way you'll never find the documents where tv_display
is Full HD
but all the documents where tv_display
is Full
and/or HD
(and/or depends by your mm
configuration).
ClientUtils::escapeQueryChars
will convert Full HD
in Full\ HD
:
http://localhost:8983/solr/buybox/select?q=tv_display:Full\ HD
So edismax takes the entire string as a single token and only in this way will be returned all the documents where tv_display
has Full HD
.
In conclusion ClientUtils::escapeQueryChars
escape all possible characters (spaces and semicolons included) that can be misunderstood by a query parser.