Search code examples
azureazure-cosmosdb

Querying and ordering data against Azure Cosmos DB results in "One of the specified inputs is invalid" if a name contains a forward slash


We are storing data in an Azure Cosmos container similar to below. The data is grouped by "ProcessDataId" and there could be >100k documents for each "ProcessDataId". The container uses "id" as a partitionKey.

The index is specified as

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ]
}

Document example

{
    "id": "49f36245-f594-4836-96a1-25e011a32aaa",
    "ProcessDataId": "37f33e4c-7ab6-4a4e-9102-e085a9663612",
        "chartId": "a5593e69-32fb-4ca0-928d-741c99b5c9f4",
        "chartPresentation": "Diagnose",
        "info": {
            "title": "My chart 1",
            "mainTrend": "FO"
        },
        "data": {
            "segment": "Combined - Combined - Combined",
            "value": "Stuff",
            "segmentationId": "",
            "confidenceValue": 99,
            "foConfidenceValue": 99,
            "scConfidenceValue": 40.6077922077922,
            "soConfidenceValue": 77,
            "gross_outstanding": 24974.50650867,
            "premium": 1186.00109866,
            "class/": "Combined",
            "line/qa": "Combined",
            "riskcodes/qa": "Combined"
        }
    }

Querying the database using the following. Please note the query is dynamically created.

SELECT c.data 
FROM c
WHERE c.ProcessDataId = "37f33e4c-7ab6-4a4e-9102-e085a9663612"  
AND (
    c["data"]["class/"] = "Combined" 
    AND 
    c["data"]["line/qa"] = "Combined" 
    AND 
    c["data"]["riskcodes/qa"] = "Combined"
    )
ORDER BY c["data"]["confidenceValue"] desc
OFFSET 0 LIMIT 6

This results in an error "{"Errors":["One of the specified inputs is invalid"]}".

If I remove the order by statement the query will run, but this will result in data being brought back in the incorrect order because we are limiting to returning 6 documents at a time. The query is modified such that the OFFSET value is changed as users move through the data, e.g. on page 5 it will be OFFSET 24, so ordering is important to the data.

The document above is regarded as valid JSON so would expect this query to work.

Any suggestions that don't involve modifying the data would be greatly appreciated.

The forward slash appears to be the problem with the order by clause. With a modified dataset replacing forward slash with an underscore the query will run happily (obviously replacing / with _).


Solution

  • Microsoft have resolved this, a fix has been applied to our Cosmos account and I can confirm this is working. The fix will be eventually be rolled out to all Cosmos accounts.