I am following this tutorial - https://learn.microsoft.com/en-us/azure/search/search-get-started-text?tabs=dotnet
This is the schema of an index
name = index_name
fields = [
SimpleField(name="HotelId", type=SearchFieldDataType.String, key=True),
SearchableField(name="HotelName", type=SearchFieldDataType.String, sortable=True),
SearchableField(name="Description", type=SearchFieldDataType.String, analyzer_name="en.lucene"),
SearchableField(name="Description_fr", type=SearchFieldDataType.String, analyzer_name="fr.lucene"),
SearchableField(name="Category", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
SearchableField(name="Tags", collection=True, type=SearchFieldDataType.String, facetable=True, filterable=True),
SimpleField(name="ParkingIncluded", type=SearchFieldDataType.Boolean, facetable=True, filterable=True, sortable=True),
SimpleField(name="LastRenovationDate", type=SearchFieldDataType.DateTimeOffset, facetable=True, filterable=True, sortable=True),
SimpleField(name="Rating", type=SearchFieldDataType.Double, facetable=True, filterable=True, sortable=True),
ComplexField(name="Address", fields=[
SearchableField(name="StreetAddress", type=SearchFieldDataType.String),
SearchableField(name="City", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
SearchableField(name="StateProvince", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
SearchableField(name="PostalCode", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
SearchableField(name="Country", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
])
]
cors_options = CorsOptions(allowed_origins=["*"], max_age_in_seconds=60)
scoring_profiles = []
suggester = [{'name': 'sg', 'source_fields': ['Tags', 'Address/City', 'Address/Country']}]
The document uploaded to the index is
documents = [
{
"@search.action": "upload",
"HotelId": "1",
"HotelName": "Secret Point Motel",
"Description": "The hotel is ideally located on the main commercial artery of the city in the heart of New York. A few minutes away is Time's Square and the historic centre of the city, as well as other places of interest that make New York one of America's most attractive and cosmopolitan cities.",
"Description_fr": "L'hôtel est idéalement situé sur la principale artère commerciale de la ville en plein cœur de New York. A quelques minutes se trouve la place du temps et le centre historique de la ville, ainsi que d'autres lieux d'intérêt qui font de New York l'une des villes les plus attractives et cosmopolites de l'Amérique.",
"Category": "Boutique",
"Tags": [ "pool", "air conditioning", "concierge" ],
"ParkingIncluded": "false",
"LastRenovationDate": "1970-01-18T00:00:00Z",
"Rating": 3.60,
"Address": {
"StreetAddress": "677 5th Ave",
"City": "New York",
"StateProvince": "NY",
"PostalCode": "10022",
"Country": "USA"
}
},
{
"@search.action": "upload",
"HotelId": "2",
"HotelName": "Twin Dome Motel",
"Description": "The hotel is situated in a nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts.",
"Description_fr": "L'hôtel est situé dans une place du XIXe siècle, qui a été agrandie et rénovée aux plus hautes normes architecturales pour créer un hôtel moderne, fonctionnel et de première classe dans lequel l'art et les éléments historiques uniques coexistent avec le confort le plus moderne.",
"Category": "Boutique",
"Tags": [ "pool", "free wifi", "concierge" ],
"ParkingIncluded": "false",
"LastRenovationDate": "1979-02-18T00:00:00Z",
"Rating": 3.60,
"Address": {
"StreetAddress": "140 University Town Center Dr",
"City": "Sarasota",
"StateProvince": "FL",
"PostalCode": "34243",
"Country": "USA"
}
},
{
"@search.action": "upload",
"HotelId": "3",
"HotelName": "Triple Landscape Hotel",
"Description": "The Hotel stands out for its gastronomic excellence under the management of William Dough, who advises on and oversees all of the Hotel's restaurant services.",
"Description_fr": "L'hôtel est situé dans une place du XIXe siècle, qui a été agrandie et rénovée aux plus hautes normes architecturales pour créer un hôtel moderne, fonctionnel et de première classe dans lequel l'art et les éléments historiques uniques coexistent avec le confort le plus moderne.",
"Category": "Resort and Spa",
"Tags": [ "air conditioning", "bar", "continental breakfast" ],
"ParkingIncluded": "true",
"LastRenovationDate": "2015-09-20T00:00:00Z",
"Rating": 4.80,
"Address": {
"StreetAddress": "3393 Peachtree Rd",
"City": "Atlanta",
"StateProvince": "GA",
"PostalCode": "30326",
"Country": "USA"
}
},
{
"@search.action": "upload",
"HotelId": "4",
"HotelName": "Sublime Cliff Hotel",
"Description": "Sublime Cliff Hotel is located in the heart of the historic center of Sublime in an extremely vibrant and lively area within short walking distance to the sites and landmarks of the city and is surrounded by the extraordinary beauty of churches, buildings, shops and monuments. Sublime Cliff is part of a lovingly restored 1800 palace.",
"Description_fr": "Le sublime Cliff Hotel est situé au coeur du centre historique de sublime dans un quartier extrêmement animé et vivant, à courte distance de marche des sites et monuments de la ville et est entouré par l'extraordinaire beauté des églises, des bâtiments, des commerces et Monuments. Sublime Cliff fait partie d'un Palace 1800 restauré avec amour.",
"Category": "Boutique",
"Tags": [ "concierge", "view", "24-hour front desk service" ],
"ParkingIncluded": "true",
"LastRenovationDate": "1960-02-06T00:00:00Z",
"Rating": 4.60,
"Address": {
"StreetAddress": "7400 San Pedro Ave",
"City": "San Antonio",
"StateProvince": "TX",
"PostalCode": "78216",
"Country": "USA"
}
}
]
When I execute search
results = search_client.search(search_text="motel", select='HotelId,HotelName,Rating', order_by='Rating desc', include_total_count=True)
print ('Total Documents Matching Query:', results.get_count())
for result in results:
print(result["@search.score"])
print("{}: {} - {} rating".format(result["HotelId"], result["HotelName"], result["Rating"]))
I get reply
Total Documents Matching Query: 2
0.6099695
2: Twin Dome Motel - 3.6 rating
0.25316024
1: Secret Point Motel - 3.6 rating
The scores are quite different but it isn't clear why. Both the places have same no. of occurrence of the word "motel" (one).
In another example, on executing
results = search_client.search(query_type='simple',
search_text="what hotel has a good restaurant on site" ,
select='HotelName,HotelId,Description')
for result in results:
print(result["@search.score"])
print(result["HotelName"])
I get
2.1393623
Sublime Cliff Hotel
1.9309065
Twin Dome Motel
1.5589908
Triple Landscape Hotel
0.7704947
Secret Point Motel
The words "good" don't appear anywhere. The word "hotel" is in all of the places. The word "restaurant" is only in "Triple Landscape Hotel" but it has lowest score.
Why are the scores different?
Azure AI search gives different scores, but I don't follow the logic of awarding scores.
According to this Documentation, Azure Cognitive Search generates scores for search results by considering multiple factors, such as term frequency
, inverse document frequency
, and field length normalization
. These scores collectively reflect the relevance of each document to the search query, providing a measure of the document's alignment with the user's search terms using the BM25 algorithm.
First example:
results = search_client.search(search_text="motel", select='HotelId,HotelName,Rating', order_by='Rating desc', include_total_count=True)
print ('Total Documents Matching Query:', results.get_count())
for result in results:
print(result["@search.score"])
print("{}: {} - {} rating".format(result["HotelId"], result["HotelName"], result["Rating"]))
Total Documents Matching Query: 2
0.6099695
2: Twin Dome Motel - 3.6 rating
0.25316024
1: Secret Point Motel - 3.6 rating
In this example, the ranking BM25 algorithm used by Azure Cognitive Search is based on a combination of factors, including relevance score, field weights, and boosting.
Second example:
results = search_client.search(query_type='simple',
search_text="what hotel has a good restaurant on site" ,
select='HotelName,HotelId,Description')
for result in results:
print(result["@search.score"])
print(result["HotelName"])
2.1393623
Sublime Cliff Hotel
1.9309065
Twin Dome Motel
1.5589908
Triple Landscape Hotel
0.7704947
Secret Point Motel
The words "good" don't appear anywhere. The word "hotel" is in all of the places. The word "restaurant" is only in "Triple Landscape Hotel" but it has lowest score. Why are the scores different?
In this second example search query what hotel has a good restaurant on site
, the hotel "Sublime Cliff Hotel" is ranked higher
because its description contains the word site
, which is a less frequently occurring term. In Azure Cognitive Search, terms that occur less frequently in a document are given a higher weight in the search score calculation. Therefore, the presence of the word "site" in the description of "Sublime Cliff Hotel" contributes to a higher search score for that hotel, resulting in it being ranked higher in the search results.
Reference:
Practical BM25 - Part 2: The BM25 Algorithm and its Variables | Elastic Blog