Search code examples
pythongremlinjanusgraphgremlinpython

gremlin io step from url


https://www.compose.com/articles/importing-graphs-into-janusgraph/ shows how you can import data into janus graph.

Since i couldn't get janusgraph docker working on my Mac computer using localhost i tried a connection to a remote Ubuntu machine where I run janusgraph with:

docker run -it -p 8182:8182 janusgraph/janusgraph

Then i wanted to use gremlin-python to load data and it failed. I tried the following to get a simple repeatable example:

server= ...
port=8182
graph = Graph()
janusgraphurl='ws://%s:%s/gremlin' % (server,port)
connection = DriverRemoteConnection(janusgraphurl, 'g')    
g = graph.traversal().withRemote(connection)
dataurl="https://github.com/krlawrence/graph/raw/master/sample-data/air-routes.graphml"
g.io(dataurl).read().iterate()

I get the follwing error:

 File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gremlin_python/driver/protocol.py", line 110, in data_received
    raise GremlinServerError(message["status"])
gremlin_python.driver.protocol.GremlinServerError: 500: https://github.com/krlawrence/graph/raw/master/sample-data/air-routes.graphml does not exist

While the link https://github.com/krlawrence/graph/raw/master/sample-data/air-routes.graphml seems to work just fine.

What would be the proper way to load graph data from an url using the python gremlin language variant?


Solution

  • Kelvin Lawrence is right.

    With a bash:

    docker run -it janusgraph/janusgraph /bin/bash
    

    i could check for available files

    root@8542ed1b8232:/opt/janusgraph# ls data
    grateful-dead-janusgraph-schema.groovy  tinkerpop-crew-typed.json
    grateful-dead-typed.json        tinkerpop-crew-v2d0-typed.json
    grateful-dead-v2d0-typed.json       tinkerpop-crew-v2d0.json
    grateful-dead-v2d0.json         tinkerpop-crew.json
    grateful-dead.json          tinkerpop-crew.kryo
    grateful-dead.kryo          tinkerpop-modern-typed.json
    grateful-dead.txt           tinkerpop-modern-v2d0-typed.json
    grateful-dead.xml           tinkerpop-modern-v2d0.json
    script-input-grateful-dead.groovy   tinkerpop-modern.json
    script-input-tinkerpop.groovy       tinkerpop-modern.kryo
    tinkerpop-classic-typed.json        tinkerpop-modern.xml
    tinkerpop-classic-v2d0-typed.json   tinkerpop-sink-typed.json
    tinkerpop-classic-v2d0.json     tinkerpop-sink-v2d0-typed.json
    tinkerpop-classic.json          tinkerpop-sink-v2d0.json
    tinkerpop-classic.kryo          tinkerpop-sink.json
    tinkerpop-classic.txt           tinkerpop-sink.kryo
    tinkerpop-classic.xml
    

    for a test i choose tinkerpop-modern.xml:

        file="data/tinkerpop-modern.xml";
        g.io(file).read().iterate()
        vCount=g.V().count().next()
        print ("%s has %d vertices" % (file,vCount))
        assert vCount==6
    

    which works. Thanks!

    To make "external" data available to the docker image the --mount option can be used:

    docker run -it -p 8182:8182 --mount src=<path to graphdata>,target=/graphdata,type=bind janusgraph/janusgraph
    

    The following helper class helps sharing files:

    RemoteGremlin

    '''
    Created on 2020-03-30
    
    @author: wf
    '''
    from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
    from gremlin_python.structure.graph import Graph
    from shutil import copyfile
    import os
    
    class RemoteGremlin(object):
        '''
        helper for remote gremlin connections
        '''
    
        def __init__(self, server, port=8182):
            '''
            construct me with the given server and port
            '''
            self.server=server
            self.port=port    
    
        def sharepoint(self,sharepoint,sharepath):
            '''
            set up the sharepoint
            '''
            self.sharepoint=sharepoint
            self.sharepath=sharepath
    
    
        def share(self,file):
            '''
            share the given file  and return the path as seen by the server
            '''
            fbase=os.path.basename(file)
            copyfile(file,self.sharepoint+fbase)
            return self.sharepath+fbase
    
        def open(self):
            '''
            open the remote connection
            '''
            self.graph = Graph()
            self.url='ws://%s:%s/gremlin' % (self.server,self.port)
            self.connection = DriverRemoteConnection(self.url, 'g')    
            # The connection should be closed on shut down to close open connections with connection.close()
            self.g = self.graph.traversal().withRemote(self.connection)
    
        def close(self):
            '''
            close the remote connection
            '''
            self.connection.close()
    

    python unit test:

    '''
    Created on 2020-03-28
    
    @author: wf
    '''
    import unittest
    from tp.gremlin import RemoteGremlin
    
    class JanusGraphTest(unittest.TestCase):
        '''
        test access to a janus graph docker instance via the RemoteGremlin helper class
        '''
    
        def setUp(self):
            pass
    
    
        def tearDown(self):
            pass
    
        def test_loadGraph(self):
            # change to your server
            rg=RemoteGremlin("capri.bitplan.com")
            rg.open()
            # change to your shared path
            rg.sharepoint("/Volumes/bitplan/user/wf/graphdata/","/graphdata/")
            g=rg.g
            graphmlFile="air-routes-small.xml";
            shared=rg.share(graphmlFile)
            # drop the existing content of the graph
            g.V().drop().iterate()
            # read the content from the air routes example
            g.io(shared).read().iterate()
            vCount=g.V().count().next()
            print ("%s has %d vertices" % (shared,vCount))
            assert vCount==47
    
    
    if __name__ == "__main__":
        #import sys;sys.argv = ['', 'Test.testName']
        unittest.main()