Search code examples
solrsolr4faceted-search

Trouble with negative facet query using solr for missing value


I'm working on a product filter for our website and have run into some difficulties regards to using "facet.missing = true".

I know that I'm supposed to use a query filter like "fq=-facetField:[* TO *]" to filter the results to products with that field missing.

I have built a global filter helper for my application that builds the fq parameter dynamically for ALL queries to prevent any from missing out on user permission based filters which essentially looks like this (php):

$params['fq'] = sprintf('((%s) AND (%s))', $custom, $system);

Where $system is the global permission based filter, which might look like (not actual but similar):

(isdiscontinued:0 AND ishidden:0 AND contract:3)

$custom contains the actual filter query the user is building via the UI. Let's say the notebook bluetooth filter has the name fq_bluetooth with values: No, Yes, or the value is missing. This would make the final fq look like:

((-fq_bluetooth:[* TO *]) AND ((isdiscontinued:0 AND ishidden:0 AND contract:3)))

However this returns 0 products for the query I'm sending for this category.

If I modify the filter query to:

((fq_bluetooth:[* TO *]) AND ((isdiscontinued:0 AND ishidden:0 AND contract:3)))

Then I get the result could expected of the counts of Yes + No, disregarding the unspecified.

How should I be formatting the filter query to get this to work properly?

[edit]

I might also want to combine the facets and maybe filter on only products with No bluetooth or ones where bluetooth isn't specified. So maybe like this (which of course doesn't work either):

((-fq_bluetooth:[* TO *] OR fq_bluetooth:"No") AND ((isdiscontinued:0 AND ishidden:0 AND contract:3)))

I'm noticing with debugQuery on, I'm seeing a filter query like:

fq_bluetooth:("No" OR -[* TO *])

being parsed as:

fq_bluetooth:No -fq_bluetooth:[* TO *]

I'm not seeing the OR in the parsed query - and from my research the fq parameter queries don't honor the OR operator(??).

Maybe the OR is working, but as the negative query seems to be failing by itself, perhaps that's why I can't see the OR working when combined like this.


Solution

  • Drop the unnecessary parenthesis and split your filter query into two filter queries, one for system restrictions and one for user-generated filtering.

    1) Since you want to satisfy two logical requirements in your request (security restrictions and user generated filtering) why not rewrite your query with two Filter Queries?

    One for system permission, the other one for the queries generated from your app UI (escaping omitted):

    ...fq=isdiscontinued:0 AND ishidden:0 AND contract:3&fq=-fq_bluetooth:[* TO *]

    This will even help with filter query caching

    2) As for your specific query issue, by experimenting, it seems that the presence of parenthesis is changing the expected results. Doing tests with a local instance, I have the following, if I use your syntax, ((-ProcedeImageElectronique:[* TO *]) AND (Pays:France AND GrandeCategorie:FILM)) returns 0 results:

    {
      "responseHeader": {
        "status": 0,
        "QTime": 1,
        "params": {
          "facet": "off",
          "indent": "true",
          "q": "*:*",
          "wt": "json",
          "fq": "((-ProcedeImageElectronique:[* TO *]) AND (Pays:France AND GrandeCategorie:FILM))"
        }
      },
      "response": {
        "numFound": 0,
        "start": 0,
        "maxScore": 0,
        "docs": []
      }
    }
    

    Versus -ProcedeImageElectronique:[* TO *] AND Pays:France AND GrandeCategorie:FILM which results in the expected behaviour:

    {
      "responseHeader": {
        "status": 0,
        "QTime": 1,
        "params": {
          "facet": "off",
          "indent": "true",
          "q": "*:*",
          "wt": "json",
          "fq": "-ProcedeImageElectronique:[* TO *] AND Pays:France AND GrandeCategorie:FILM"
        }
      },
      "response": {
        "numFound": 1733,
        "start": 0,
        "maxScore": 1,
        "docs": [...]
      }
    }
    

    Likewise, using two filter queries returns the expected result:

    {
      "responseHeader": {
        "status": 0,
        "QTime": 0,
        "params": {
          "facet": "off",
          "indent": "true",
          "q": "*:*",
          "wt": "json",
          "fq": [
            "-ProcedeImageElectronique:[* TO *]",
            "GrandeCategorie:FILM AND Pays:France"
          ]
        }
      },
      "response": {
        "numFound": 1733,
        "start": 0,
        "maxScore": 1,
        "docs": [...]
      }
    }
    

    EDIT: this post clearly explains why using parenthesis may lead to unexpected results. To quote:

    If the top level BoolenQuery contains somewhere inside of it a nested BooleanQuery which contains only negated clauses, that nested query will not be modified, and it (by definition) an't match any documents -- if it is required, that means the outer query will not match.