Search code examples
azure-cognitive-searchn-gram

Understanding scoring results - exact scores lower than partial


I understand the Lucene Explain feature is not implemented for Azure Search and you can vote for it here if you want: https://feedback.azure.com/forums/263029-azure-search/suggestions/7379515-support-explain-api

Here is my index that I created

{
  "name": "fieldvalue38gram",
  "fields": [
    {
      "name": "FieldValueID",
      "type": "Edm.String",
      "facetable": false,
      "filterable": false,
      "key": true,
      "retrievable": true,
      "searchable": false,
      "sortable": false,
      "analyzer": null,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "synonymMaps": [],
      "fields": []
    },
    {
      "name": "FieldID",
      "type": "Edm.Int32",
      "facetable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": false,
      "analyzer": null,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "synonymMaps": [],
      "fields": []
    },
    {
      "name": "Text",
      "type": "Edm.String",
      "facetable": false,
      "filterable": true,
      "retrievable": true,
      "searchable": true,
      "sortable": true,
      "analyzer": "whitespace",
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "synonymMaps": [],
      "fields": []
    },
    {
      "name": "partialName",
      "type": "Edm.String",
      "facetable": false,
      "filterable": true,
      "retrievable": false,
      "searchable": true,
      "sortable": true,
      "analyzer": null,
      "indexAnalyzer": "ingram",
      "searchAnalyzer": "whitespace",
      "synonymMaps": [],
      "fields": []
    }
  ],
  "suggesters": [],
  "scoringProfiles": [
    {
      "name": "exactFirst",
      "text": {
        "weights": {
          "Text": 2,
          "partialName": 1
        }

      }
    }
  ],
  "defaultScoringProfile": "",
  "corsOptions": null,
  "analyzers": [
    {
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "name": "ingram",
      "tokenizer": "whitespace",
      "tokenFilters": [ "lowercase", "NGramTokenFilter" ],
      "charFilters": []
    }
  ],
  "charFilters": [],
  "tokenFilters": [
    {
      "@odata.type": "#Microsoft.Azure.Search.NGramTokenFilterV2",
      "name": "NGramTokenFilter",
      "minGram": 3,
      "maxGram": 8
    }
  ],
  "tokenizers": []
}

When I query using search=black

indexes/fieldvalue38gram/docs?api-version={{version}}&scoringProfile=exactFirst&$top=21&search=black

I end up getting

{
    "@search.score": 4.051315,
    "FieldValueID": "167402",
    "FieldID": 8,
    "Text": "BLACKSMITH",
    "partialName": "BLACKSMITH"
},
{
    "@search.score": 3.9905946,
    "FieldValueID": "18594",
    "FieldID": 8,
    "Text": "BLACK",
    "partialName": "BLACK"
},

which is not what I would expect.

I should get a boost for exact match. In addition, reading through documentation, I see that length plays a part in the scoring meaning shorter text gets a higher score during indexing.

With this in mind I don't understand why the second result would scored lower than the first.

  • Can anyone explain the scoring in this scenario?
  • Is there anything I can do to help understand the scoring?

Thanks

UPDATE

2019-10-24
Here is an example of what I've been battling with the scoring. The 1st and 3rd entry are identical other than the doc id (FieldValueID). I can find no rhyme or reason for the difference in the score.

{
    "value": [
        {
            "@search.score": 0.10707458,
            "FieldValueID": "2",
            "FieldID": 2,
            "Text": "Another Brown2Black Cow"
        },
        {
            "@search.score": 0.021882897,
            "FieldValueID": "4",
            "FieldID": 2,
            "Text": "Brown"
        },
        {
            "@search.score": 0.017285194,
            "FieldValueID": "7",
            "FieldID": 2,
            "Text": "Another Brown2Black Cow"
        }
    ]
}

2019-10-25
Just found this: https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture#scoring-in-a-distributed-index

and this Note https://learn.microsoft.com/en-us/azure/search/search-capacity-planning#partition-and-replica-combinations


Solution

  • My guess would be that its because you are forcing the TEXT field to use the whitespace analyzer rather than using the default analyzer. I don't believe the whitespace analyzer will lowercase your terms. Since your search query and your TEXT field contain different casing, I'm not sure they'd match. You can try it out by trying the search with a different casing and see what is being returned (same for the search analyzer, I'd recommend against simply using the whitespace analyzer there too).