Search code examples
scalaapache-sparkcassandrajanusgraph

JanusGraph indexing in Scala


I am using Spark to make a JanusGraph from a data stream, but am having issues indexing and creating properties. I want to create an index by a vertex property called "register_id". I am not sure I'm doing it the right way.

So, here's my code:

var gr1 = JanusGraphFactory.open("/Downloads/janusgraph-cassandra.properties")
gr1.close()
// This is done to clear the graph made in every run.
JanusGraphFactory.drop(gr1)
gr1 = JanusGraphFactory.open("/Downloads/janusgraph-cassandra.properties")
var reg_id_prop = gr1.makePropertyKey("register_id").dataType(classOf[String]).make()
var mgmt = gr1.openManagement()
gr1.tx().rollback()
mgmt.buildIndex("byRegId", classOf[Vertex]).addKey(reg_id_prop).buildCompositeIndex()

When I run the above, I get an error saying:

"Vertex with id 5164 was removed".

Also, how do I check if I have vertices with a certain property in the graph or not in Scala. I know in gremlin, g.V().has('name', 'property_value') works, but can't figure out how to do this in Scala. I tried Gremlin-Scala but can't seem to find it.

Any help will be appreciated.


Solution

  • You should be using mgmt object to build the schema, not the graph object. You also need to make sure to mgmt.commit() the schema updates.

    gr1 = JanusGraphFactory.open("/Downloads/janusgraph-cassandra.properties")
    var mgmt = gr1.openManagement()
    var reg_id_prop = mgmt.makePropertyKey("register_id").dataType(classOf[String]).make()
    mgmt.buildIndex("byRegId", classOf[Vertex]).addKey(reg_id_prop).buildCompositeIndex()
    mgmt.commit()
    

    Refer to the indexing docs from JanusGraph.

    For your second question on checking for the existence of a vertex using the composite index, you need to finish your traversal with a terminal step. For example, in Java, this would return a boolean value:

    g.V().has('name', 'property_value').hasNext()
    

    Refer to iterating the traversal docs from JanusGraph.

    Reading over the gremlin-scala README, it looks like it has a few options for terminal steps that you could use like head, headOption, toList, or toSet.

    g.V().has('name', 'property_value').headOption
    

    You should also check out the gremlin-scala-examples and the gremlin-scala traversal specification.