Search code examples
gogoogle-cloud-platformnosqlgoogle-cloud-datastoresql-like

How to do an SQL style search query in google cloud datastore?


Google Cloud datastore does not allow to make SQL style search queries like

SELECT * FROM Person WHERE Name LIKE "Rob*"

that would return Rob, Robert, Roberto, Roberta, Roby and so on.

GCP datastore only allows to filter on fields using operators: >, >=, <, <= with rules like:

  • any upper case letter is smaller than any lower case letter
  • a < b < c < ... < z

With these rules, the query

query := datastore.NewQuery("Person").Filter("Name >= ", "Rob").Order("Name")

would not only return all Person whose name starts with Rob, but also all Person whose name is greater than Rob (Ruben, Sarah, Zoe) and all Person whose names starts with a lower case letter.

The present post is a hack that I found in Go to emulate a SQL style search query.


Solution

  • The following solution addresses the issue programmatically. It is coded in go but i believe it could be easily adapted to any language.

    I'll first produce the whole snippet and then I shall break it down.

    func (store *PersonStore) Search(name string) ([]Person, error) {
        context := context.Background()
        persons := make([]*Person, 0)
        query := datastore.NewQuery("Person").Filter("Name >= ", strings.ToLower(name)).Order("Name")
    
        keys, err := store.client.GetAll(context, query, &persons)
        if err != nil {
            return nil, fmt.Errorf("unable to search for persons w/ names containing %s - %v", name, err)
        }
    
        filteredPersons := make([]*Perons, 0)
        for i, p := range persons {
            p.ID = keys[i].ID
            if !strings.Contains(p.Name, strings.ToLower(name)) {
                break
            } else {
                filteredPersons = append(filteredPersons, p)
            }
        }
    }
    

    1 - Lower case assumption

    In order for this code to work we first need to make a very strong assumption that all names are in lower case. If, for some reason, you cannot make this assumption, you can still partially use this code, but it will be less efficient.

    2 - Query the datastore

    The first part of the code is dedicated to fetching the datastore for Persons whose name is matching the desired pattern.

        context := context.Background()
        persons := make([]*Person, 0)
        query := datastore.NewQuery("Person").Filter("Name >= ", strings.ToLower(name)).Order("Name")
    
        keys, err := store.client.GetAll(context, query, &persons)
        if err != nil {
            return nil, fmt.Errorf("unable to search for persons w/ names containing %s - %v", name, err)
        }
    

    We make sure we use strings.ToLower(name) to make sure we won't fetch ALL Persons. Ideally name should be also trimmed, but this is usually done in the front-end, so we ommitted this here.

    Once again, this is base on the assumption that all Names are in lower case. If you cannot assume this, you can always use

    query := datastore.NewQuery("Person").Filter("Name >= ", name).Order("Name")
    

    You'll just simply get an initial list with (possibly a lot) more Persons.

    Finally, we order our fetched list with .Order("Name") as this is going to be the base point of the second part of the code.

    3 - Filter on Names

    Up to here it was a simple GetAll piece of code. We still need to insert the keys into the Person structures. We need to find a way to optimize that. For this we can rest on the fact that persons and keys list are in exact length and order. So the upcoming for loop starts exactly like a regular insert Key into structure bit.

        for i, p := range persons {
            p.ID = keys[i].ID
    
    

    The next bit is where the optimization is: since we know that the Persons are ordered by Name we are certain that, as soon as strings.Contains(p.Name, strings.ToLower(name)) is not true we have selected all Persons whose Name matches our criteria, that is, as soon as p.Name doesn't start with Rob anymore but with Roc or Rod or anything lexicographically greater than this.

            if !strings.Contains(p.Name, strings.ToLower(name)) {
                break
    

    We can then escape our loop using a break instruction, hopefuly having only parsed the first few elements of persons list that matche our criteria. These fall into the else statement:

            } else {
                filteredPersons = append(filteredPersons, p)
            }
    

    4 - Filter on Names without the lower case assumption

    As I said earlier, if you cannot assume all names are in lower case, you can still use this code, but it will not be optimized, because you will mandatorily have to scan the full persons list returned by the query.

    The code should look liske this

        filteredPersons := make([]*Perons, 0)
        for i, p := range persons {
            p.ID = keys[i].ID
            if strings.Contains(p.Name, strings.ToLower(name)) {
                filteredPersons = append(filteredPersons, p)
            }
        }