Search code examples
azure-cosmosdbgremlintinkerpoptinkerpop3azure-cosmosdb-gremlinapi

Gremlin query combine vertices with unrelated vertices CosmosDB


I would like to get several vertices e.G. with the label "user" combined with vertices, they are not related to, yet e.G. with the label "movie".

I know, that the strength of Gremlin is traversing the vertex, and combining objects that are not related is not the best use case for the graph. I am using Azure CosmosDB for my application, so if there is any idea how to do this more performant feel free to let me know. If you can do this with gremlin I need some help with the query. I provide an example here:

There are 4 users: bob, jose, frank, peter and 4 movies: movie1, movie2, movie3, movie4

Between the users and movies there can be an edge "watched"

My example data looks as follows:

watched:
[bob, [movie1,movie2]]
[jose, [movie3]]
[frank, []]
[peter, [movie]]

The result and format I would like to get is following:

not watched:
[bob, movie3]
[bob, movie4]
[jose, movie1]
[jose, movie2]
[jose, movie4]
[frank, movie1]
[frank, movie2]
[frank, movie3]
[frank, movie4]
[peter, movie1]
[peter, movie2]
[peter, movie3]

The script to set up the graph (using /partition_key as partition key):

g.addV("user").property("partition_key", 1).property("id", "bob")
g.addV("user").property("partition_key", 1).property("id", "jose")
g.addV("user").property("partition_key", 1).property("id", "frank")
g.addV("user").property("partition_key", 1).property("id", "peter")

g.addV("movie").property("partition_key", 1).property("id", "movie1")
g.addV("movie").property("partition_key", 1).property("id", "movie2")
g.addV("movie").property("partition_key", 1).property("id", "movie3")
g.addV("movie").property("partition_key", 1).property("id", "movie4")

g.V("bob").addE("watched").to(g.V("movie1"))
g.V("bob").addE("watched").to(g.V("movie2"))
g.V("jose").addE("watched").to(g.V("movie3"))
g.V("peter").addE("watched").to(g.V("movie4"))

Please consider, that I cannot use lambdas, because Azure CosmosDB doesn't support them.


Solution

  • A join in gremlin can be realized by repeating the V() step. After realizing that, the gremlin query almost reads as an ordinary SQL query, see below.

    g.V().has("id", "bob").addE("watched").to(__.V().has("id", "movie1"))
    g.V().has("id", "bob").addE("watched").to(__.V().has("id", "movie2"))
    g.V().has("id", "jose").addE("watched").to(__.V().has("id", "movie3"))
    g.V().has("id", "peter").addE("watched").to(__.V().has("id", "movie4"))
    
    g.V().hasLabel("user").as("u").
      V().hasLabel("movie").as("m").
      in("watched").where(neq("u")).
      select("u", "m").by("id").
      order().by("u").by("m")
    
    ==>[u:bob,m:movie3]
    ==>[u:bob,m:movie4]
    ==>[u:frank,m:movie1]
    ==>[u:frank,m:movie2]
    ==>[u:frank,m:movie3]
    ==>[u:frank,m:movie4]
    ==>[u:jose,m:movie1]
    ==>[u:jose,m:movie2]
    ==>[u:jose,m:movie4]
    ==>[u:peter,m:movie1]
    ==>[u:peter,m:movie2]
    ==>[u:peter,m:movie3]
    

    You are right in saying that this query does not perform well in gremlin and I would advise you to use the SQL API of CosmosDb.