Search code examples
luceneodataazure-cognitive-searchazure-search-.net-sdk

azure search - filter key value pair concatenated by double pipes


I have a EDM.string field that I use to store Key Value pairs separated by '||' and commas. For example:

{
    "CustomField": "1234|||student, 5678||blue, 999||metallica, etc..."
}

I need to perform a query to extract a key-value combination. For example:

search=5678 blue&searchFields=CustomField&searchMode=all&queryType=full

Using Regular Expression, I was expecting that the following should work:

search=/5678.*blue/&queryType=full&searchMode=all

I am using the default analyzer, so it seems that it's ignoring the '||'. I've tried using Regular Expressions but with no success. Is it possible to query by the key-value pair, without storing it on a EDM.Collection(string) ? I would like to avoid a new reindex process. Thanks in advance.

Update

Using Collections and a new dataset:

{
    "@odata.context": "https://[service].search.windows.net/indexes('[index]')/$metadata#docs",
    "@odata.count": 3,
    "value": [
        {
            "@search.score": 0.45867884,
            "uniqueid": "5",
            "Name": null,
            "Kvp": [
                "1234||sepultura",
                "999||programmer",
                "876||no education"
            ],
            "Kvp2": "1234||sepultura, 999 programmer, 876||no education"
        },
        {
            "@search.score": 0.38223237,
            "uniqueid": "1",
            "Name": null,
            "Kvp": [
                "1234||metallica",
                "999||horse education",
                "876||high school"
            ],
            "Kvp2": "1234||metallica, 999 horse education, 876||high school"
        },
        {
            "@search.score": 0.38223237,
            "uniqueid": "3",
            "Name": null,
            "Kvp": [
                "1234||john mayer",
                "999||kid education",
                "876||university"
            ],
            "Kvp2": "1234||john mayer, 999 kid education, 876||university"
        }
    ]
}

My search query looks like:

Kvp: education&$count=true&queryType=full&searchMode=all

The problem is that I would like to avoid uniqueid 5 to be retrieved. Although it has "education" as a value for one of the tags, it's not the 999 key.

Also tried:

Kvp: 999||education&$count=true&queryType=full&searchMode=all

Kvp: /.*999.*/ AND /.*education.*/&$count=true&queryType=full&searchMode=all

Kvp: /999.*education/&$count=true&queryType=full&searchMode=all

Solution

  • Use a phrase search by surrounding your query with quotes: Kvp:"999||education"

    The analyzer does remove the | character, so this is effectively equivalent to Kvp:"999 education". The thing to understand is how analysis works. What you are indexing here, when you index: "1234|||student, 5678||blue, 999||metallica", is six terms:

    • 1234
    • student
    • 5678
    • blue
    • 999
    • metallica

    The and query doesn't work because it looks for the matches anywhere in that list, in the field, thus matching id 5. Order or adjacency are not considered as they would be for a phrase query.

    The regex query doesn't work because it must match everything within a single term. Kvp:999.*education won't work because "999" and "education" are analyzed into separate terms, so there are no single terms that match that regex.


    Another option, by the way, would be to change the analyzer. If you used a whitespace analyzer, for instance, it would change the indexed terms to:

    • 1234||student,
    • 5678||blue,
    • 999||metallica,

    Which could be a solution for you, but would make it impossible to search efficiently for just "metallica".