Search code examples
pythongremlinamazon-neptunegremlin-servergremlinpython

Gremlin query for find duplicates and update vertex property


I want to find duplicate records and update the property isDuplicate to yes.

I am able to find duplicate records, couldn't find way to update the property.

 g.V() \
.has("customerId") \
.group().by("customerId") \
.unfold() \
.toList()

The above query returns single records also. I want to remove them as well.


Solution

  • Here is one way to do it:

    gremlin> g = TinkerGraph.open().traversal()
    ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
    gremlin> g.addV('person').property('customerId','alice').
    ......1>   addV('person').property('customerId','alice').
    ......2>   addV('person').property('customerId','bob').
    ......3>   addV('person').property('customerId','alice').iterate()
    gremlin> g.V().hasLabel('person').has('customerId').
    ......1>   group().by('customerId').
    ......2>   unfold().
    ......3>   select(values).filter(count(local).is(gt(1))).unfold().
    ......4>   property('isDuplicate','yes')
    ==>v[0]
    ==>v[2]
    ==>v[6]
    gremlin> g.V().elementMap()
    ==>[id:0,label:person,customerId:alice,isDuplicate:yes]
    ==>[id:2,label:person,customerId:alice,isDuplicate:yes]
    ==>[id:4,label:person,customerId:bob]
    ==>[id:6,label:person,customerId:alice,isDuplicate:yes]