Search code examples
azure-cosmosdbazure-cosmosdb-sqlapi

Does azure cosmosdb have like any operator to find records matching a pattern


I am just wondering how do I do the below using the SQL API on Azure CosmosDB -

SELECT user_id FROM users WHERE user_id LIKE ANY(contacts);

The above statement works on postgres, wondering if there is anything similar in Azure CosmosDB.

The above statement receives a set of contacts in an array format like this ["4160000000","7780000000"] and finds the corresponding records in Postgres db.

UPDATE @Sajeetharan

Below are the documents I have in Cosmos DB-

{
    "users": [
        {
            "partitionKey": "user",
            "userPhoneNumber": "14161231234",
            "userDisplayName": "Test User 1"
        },
        {
            "partitionKey": "user",
            "userPhoneNumber": "18055678978",
            "userDisplayName": "Test User 2"
        },
        {
            "partitionKey": "user",
            "userPhoneNumber": "17202228799",
            "userDisplayName": "Test User 3"
        },
        {
            "partitionKey": "user",
            "userPhoneNumber": "17780265987",
            "userDisplayName": "Test User 4"
        }
    ]
}

I will be sending in a set of userPhoneNumbers from javascript in an array format like below and then I need the SQL query to return the corresponding records in cosmos db.

var userPhoneNumbers = ["4161231234","7202228799"];

The above array has two values, which when sent to the cosmosdb should return the first and third record.

The userPhoneNumbers sent in will be sometimes missing the country code, so the search should be performed using CONTAINS or ENDSWITH.

UPDATE APRIL 4th 2023 - After trying out RitikaNalwayas answer

I tried out your user defined function with the Select statement that you provided but the result I get is

enter image description here -

USER DEFINED FUNCTION

function findUserNameByPhone(users, userPhoneNumbers) {
    var s, i, j;
    let result = [];
    for(j = 0; j < userPhoneNumbers.length; j++)
    {
        for (i = 0; i < users.length; i++) 
        {
            s = users[i];
            if(s.userPhoneNumber.match(userPhoneNumbers[j]))
                result.push(s)
        }
    }    
    return result;
}

SELECT STATEMENT

SELECT DISTINCT udf.findUserByPhoneNumber(c,'14161231234','17780000000']) FROM c

Now if I console log the users array that the function is receiving it is in the below format - [ { "$1": { "partitionKey": "user", "userPhoneNumber": "14161231234", "userDisplayName": "Test User 1",
} }, { "$1": { "partitionKey": "user", "userPhoneNumber": "18055678978", "userDisplayName": "Test User 2",
} } ]

I am not understanding what the $1 is and if that has anything to do with the result. I have tried the above code on code editor outside of azure portal where I am able to get the right results.

I have also added $1 to the match statement inside the If condition like below, still I don't get any results. I am not sure what I am doing wrong here.

if(s.$1.userPhoneNumber.match(userPhoneNumbers[j]))

UPDATE - April 24th 2023 after trying out the query from the RitikaNalwayas latest comment

This is what SELECT * FROM c returns

[
    {
        "partitionKey": "user",
        "userPhoneNumber": "14161231234",
        "userDisplayName": "User 1",
    },
    {
        "partitionKey": "user",
        "userPhoneNumber": "18055678978",
        "userDisplayName": "User 2",
    },
    {
        "partitionKey": "user",
        "userPhoneNumber": "17202228799",
        "userDisplayName": "User 3",
    }
]

Here is my latest user defined function -

function findUserNameByPhone(users, userPhoneNumbers) {
    var s, i, j;
    let result = [];
    for(j = 0; j < userPhoneNumbers.length; j++)
    {
        for (i = 0; i < users.length; i++) 
        {
            s = users[i];
            if(s.userPhoneNumber.match(userPhoneNumbers[j]))
                result.push(s)
        }
    }    
    return result
}

SELECT Query -

SELECT DISTINCT udf.findUserByPhoneNumber(c.users,'14161231234','17780000000']) FROM c

This is what I get returned -

[
     {}
]

UPDATE - 26th APRIL 2023

Upon doing some analysis I found that just printing out the length of the retrieved users array returns nothing, in that case how would looping through the users array work? There is something weird that is happening which I am not aware of.

function findUserNameByPhone(users, userPhoneNumbers) {
    var s, i, j;
    let result = [];
    var x = users.length
    
    return x;
}

Solution

  • As per cosmos technical support, using user defined function is not recommended as udf's dont utilize indexes so performance will be really poor. The tech support recommends using the ENDSWITH function with the OR operator to pull the required data.

    So in the scenario I could do a query construction with the all phone numbers received and send in that query to cosmos db.

    The below code is in python -

    def resolve_users(_, info):
        phones = ["14161231234", "7202228799"]
        query = "SELECT c.id, c.userPhoneNumber, c.userDisplayName FROM c WHERE c.partitionKey = 'user' AND ("
    
        for p in range(0, len(phones)):
            if p + 1 != len(phones):
                query += "ENDSWITH(c.userPhoneNumber,'"+phones[p]+"') OR "
            else:
                query += "ENDSWITH(c.userPhoneNumber,'"+phones[p]+"'))"
    
        items = list(container.query_items(
            query=query,
            enable_cross_partition_query=True
        ))
        return items
    

    The above code would produce a query like below -

    SELECT * FROM c WHERE c.partitionKey = 'user' AND (ENDSWITH(c.userPhoneNumber,'4161231234') OR ENDSWITH(c.userPhoneNumber,'7202228799'))
    

    I am super happy as I was able to finally get an answer to this.