Search code examples
pythongremlinbulk-loadjanusgraphgremlin-server

How to set schema for janusgraph for batch-loading using python


I am trying to bulk-load data into a janusgraph 0.2 over HBase. I am using gremlinpython library of python. For bulk-loading, I set storage.batch-loading to true, and now have to define the schema for the graph.

I found documentation to set schema for the graph (https://docs.janusgraph.org/0.2.0/schema.html & https://docs.janusgraph.org/0.2.0/advanced-schema.html).

It suggests some basic code for schema :

mgmt = graph.openManagement()
follow = mgmt.makeEdgeLabel('follow').multiplicity(MULTI).make()
mother = mgmt.makeEdgeLabel('mother').multiplicity(MANY2ONE).make()
mgmt.commit()

I connected to the graph using gremlinpython library of python. This is what I'm doing :

from    gremlin_python                                  import  statics
from    gremlin_python.structure.graph                  import  Graph
from    gremlin_python.process.graph_traversal          import  __
from    gremlin_python.process.strategies               import  *
from    gremlin_python.driver.driver_remote_connection  import      DriverRemoteConnection
from    gremlin_python.process.traversal                import  T
from    gremlin_python.process.traversal                import  Order
from    gremlin_python.process.traversal                import  Cardinality
from    gremlin_python.process.traversal                import  Column
from    gremlin_python.process.traversal                import  Direction
from    gremlin_python.process.traversal                import  Operator
from    gremlin_python.process.traversal                import  P
from    gremlin_python.process.traversal                import  Pop
from    gremlin_python.process.traversal                import  Scope
from    gremlin_python.process.traversal                import  Barrier

from    config                                          import  graph_url, graph_name

graph = Graph()
drc = DriverRemoteConnection(graph_url, graph_name)

g = graph.traversal().withRemote(drc)

# I successfully get g here, I check it by :
# g.V().count().next()

Now my question is, where should I set the schema. I tried to do mgmt = graph.openManagement() after the commented out lines, but it doesn't work.


Update

It works on the gremlin console as :

gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured localhost/127.0.0.1:8182
gremlin> 
gremlin> :> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@625dfab4 

But any furthur commands don't work here :

:> follow = mgmt.makeEdgeLabel('follow').multiplicity(MULTI).make()
No such property: mgmt for class: Script10

Solution

  • The gremlinpython driver is a Gremlin Language Variant (GLV), which allows you to use Gremlin natively in a programming language, Python. The JanusGraph schema definitions are specific to the JanusGraph db, however the gremlinpython GLV is a generic TinkerPop driver, so it doesn't have the constructs to call db-specific APIs.

    As you've noted, you could declare your schema through the Gremlin Console. Another alternative is to use a string-based Gremlin driver, like gremlinclient or gremlinpy, and send your schema as a string query to the server.