Search code examples
lucenetridiontridion-2011

Error when searching the Content Manager with wildcards


I have noticed that if I search for certain phrases Tridion Content Manager gives me the following error

Unable to get the list of search results.
Unable to process the Search Request. Invalid search query: (*out*) AND RepositoryId:tcm\:0\-4\-1 AND OrganizationalItemAncestorIds:tcm\:*\-135625\-2. maxClauseCount is set to 10240
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 10240
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:127)
at org.apache.lucene.search.ScoringRewrite$1.addClause
[...and so on]

In the example above I am searching for the phrase *out*. It also fails when I search for the phrase *a* and various other smaller wildcard queries. out* works fine and *out* works fine if I limit the search to just the item titles. It doesn't matter whether I search withing "all publications" or a particular folder. It doesn't even matter if I limit the search results to the minimum (50).

Maybe this is something to do with the number of results returned?

The exact same search works fine on Tridion 5.3, I presume it isn't using lucene?

Any ideas on how to fix this?


Solution

  • Leading wildcards are not allowed by Lucene (version R5.3 of Tridion used a Verity implementation that allowed them), due to the way it is indexed and searched. A leading wildcard effectively causes the index to scan every term for matches, rather than using more typical and performant methods using the index to find matches (see Lucene FAQ)

    You can enable this by calling QueryParser.setAllowLeadingWildcard(true), but I strongly recommend against it in most cases.

    A better approach might be to filter on terms that require a leading wildcard, rather than passing them into the query (not really feasible if the leading wildcard term is the only term being searched on)

    Also, Lucene provides the ReverseStringFilter, a filter which indexes all terms in reverse as well. This would probably be the best way to create your index to enable leading wildcard searching.

    Right off, I don't think either of these really handle a query like *out* though. Representing you data as N-Grams might be an option for that (see NGramTokenizer).