Search code examples
java-8gremlingraph-databasestitanchild-nodes

Counting Total Child Nodes in Titan GraphDB through Gremlin Query


I had created Titan Graph of hierarchical tree in Java.
How do you find total child nodes hierarchy from a specified node with Gremlin?


Solution

  • The basic pattern for traversing a tree lies in repeat() step. As an example, I use the graph depicted in the Tree Recipes section of the TinkerPop documentation:

    gremlin> g = TinkerGraph.open().traversal()
    ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
    gremlin> g.addV(id, 'A').as('a').
    ......1>            addV(id, 'B').as('b').
    ......2>            addV(id, 'C').as('c').
    ......3>            addV(id, 'D').as('d').
    ......4>            addV(id, 'E').as('e').
    ......5>            addV(id, 'F').as('f').
    ......6>            addV(id, 'G').as('g').
    ......7>            addE('hasParent').from('a').to('b').
    ......8>            addE('hasParent').from('b').to('c').
    ......9>            addE('hasParent').from('d').to('c').
    .....10>            addE('hasParent').from('c').to('e').
    .....11>            addE('hasParent').from('e').to('f').
    .....12>            addE('hasParent').from('g').to('f').iterate()
    gremlin> g.V('F').repeat(__.in('hasParent')).emit().count()
    ==>6
    gremlin> g.V('C').repeat(__.in('hasParent')).emit().count()
    ==>3
    gremlin> g.V('A').repeat(__.in('hasParent')).emit().count()
    ==>0
    

    The key to getting the count is in the use of emit() which allows all the traversers encountered in the repeat() to be counted.

    Just for comparison to what kind of speed you can get with TinkerGraph (in-memory) I generated a 400,000 vertex deep tree:

    gremlin> graph = TinkerGraph.open()
    ==>tinkergraph[vertices:0 edges:0]
    gremlin> g = graph.traversal()
    ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
    gremlin> lastV = g.addV().next()
    ==>v[0]
    gremlin> (0..<400000).each{lastV=g.V(lastV).as('f').addV().as('t').addE('next').from('f').to('t').select('t').next()}
    ==>0
    ==>1
    ==>2
    ==>3
    ...
    gremlin> graph
    ==>tinkergraph[vertices:400001 edges:400000]
    gremlin> clockWithResult{ g.V(0L).repeat(__.out('next')).emit().count().next() }
    ==>171.44102253
    ==>400000
    

    Done in 171ms. TinkerGraph is obviously faster as it holds its data purely in-memory. Titan/JanusGraph and other graphs have to read from disk.