Search code examples
gremlinamazon-neptune

Gremlin Neptune query similar users based on common ratings count


From the following graph in the Tickerpop recipes:

  g.addV("user").property("name", "alice").as("u1").
  addV("user").property("name", "jen").as("u2").
  addV("user").property("name", "dave").as("u3").
  addV("movie").property("name", "the wild bunch").as("m1").
  addV("movie").property("name", "young guns").as("m2").
  addV("movie").property("name", "unforgiven").as("m3").
  addE("friend").from("u1").to("u2").
  addE("friend").from("u1").to("u3").
  addE("like").from("u2").to("m1").
  addE("like").from("u2").to("m2").
  addE("like").from("u3").to("m2").
  addE("like").from("u3").to("m3")

How can i query friends of a specific user in descending order of how many common movies they have liked? Thanks


Solution

  • I added some more edges to the graph to make the results more interesting.

      g.addV("user").property("name", "alice").as("u1").
      addV("user").property("name", "jen").as("u2").
      addV("user").property("name", "dave").as("u3").
      addV("movie").property("name", "the wild bunch").as("m1").
      addV("movie").property("name", "young guns").as("m2").
      addV("movie").property("name", "unforgiven").as("m3").
      addE("friend").from("u1").to("u2").
      addE("friend").from("u1").to("u3").
      addE("friend").from("u2").to("u3").
      addE("friend").from("u2").to("u1").
      addE("like").from("u2").to("m1").
      addE("like").from("u2").to("m2").
      addE("like").from("u3").to("m2").
      addE("like").from("u3").to("m3").
      addE("like").from("u1").to("m1").
      addE("like").from("u1").to("m2")  
    

    We can then, for example, find Jen and collect the movies Jen likes and then find Jen's friends and count the number of movies they each like in common.

    gremlin> g.V().
    ......1>   has('name','jen').as('jen').
    ......2>   sideEffect(out('like').store('movies')).
    ......3>   out('friend').
    ......4>   group().
    ......5>     by("name").
    ......6>     by(out('like').where(within('movies')).count())    
    
    
    ==>[dave:1,alice:2]    
    

    You could also go one step further and order the results in either ascending or descending order.

    gremlin> g.V().
    ......1>   has('name','jen').as('jen').
    ......2>   sideEffect(out('like').store('movies')).
    ......3>   out('friend').
    ......4>   group().
    ......5>     by("name").
    ......6>     by(out('like').where(within('movies')).count()).
    ......7>     order(local).
    ......8>       by(values,desc)  
    
    ==>[alice:2,dave:1]     
    

    UPDATED to add an example where other information is part of the result.

    gremlin> g.V().
    ......1>   has('name','jen').as('jen').
    ......2>   sideEffect(out('like').store('movies')).
    ......3>   out('friend').
    ......4>   group().
    ......5>     by(valueMap(true)).
    ......6>     by(out('like').where(within('movies')).count()).
    ......7>     order(local).
    ......8>     by(values,asc)    
    
    ==>[[id:61353,label:user,name:[dave]]:1,[id:61349,label:user,name:[alice]]:2] 
    

    To omit the counts from the result

    gremlin> g.V().
    ......1>   has('name','jen').as('jen').
    ......2>   sideEffect(out('like').store('movies')).
    ......3>   out('friend').
    ......4>   group().
    ......5>     by(valueMap(true)).
    ......6>     by(out('like').where(within('movies')).count()).
    ......7>     order(local).
    ......8>     by(values,asc).
    ......9>   select(keys)  
    
    ==>[[id:61353,label:user,name:[dave]],[id:61349,label:user,name:[alice]]]