Search code examples
pythongraphgremlin

Load csv on Gremlin error "Mapping for code not found"


I am trying to load a csv on Gremlin console. The file just has two columns [code,desc] and almost 6K records. The code (slide 9) used to load the csv is provided as a solution in a similar question.

:install org.apache.commons commons-csv 1.5
import org.apache.commons.csv.CSVFormat
g = TinkerGraph.open().traversal()
fileReader = new FileReader('C:/airports.csv')
records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(fileReader);[]
records.each{
        code=it.get('code');
        desc=it.get('desc');
        g.V().has('code', code).fold().coalesce(
            unfold(),
            addV('airport').property('code', code).property('desc',desc)
        ).iterate()
}
g.V().count()

And the error that I got is:

Mapping for code not found, expected one of [´╗┐code, desc]

Finally, I want to know if by loading the CSV file on the Gremlin console I will be able to read it in Python, provided I set up the remote connection.

Thank you


Solution

  • The error you are seeing implies the CSV file does not have a header row. Using this test file:

    $ cat airports.csv 
    code,desc
    LHR,"London Heathrow"
    LGW,"London Gatwick"
    DFW,"Dallas Fort Worth"
    

    I was able to read it using

    gremlin> fileReader = new FileReader('airports.csv')
    ==>java.io.FileReader@19c1820d
    gremlin> records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(fileReader);[]
    gremlin> records.each{
    ......1>         c = it.get('code')
    ......2>         d = it.get('desc')
    ......3>         println(c + " : " + d)
    ......4> }  
    
    LHR : London Heathrow
    LGW : London Gatwick
    DFW : Dallas Fort Worth   
    

    Note that I did not use desc as a variable name. Inside the Gremlin console that will collide with the Order.desc enum.

    Given these building blocks...

    gremlin> fileReader = new FileReader('airports.csv')
    ==>java.io.FileReader@10850d17
    gremlin> records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(fileReader);[]
    gremlin> records.each{
    ......1>         c=it.get('code');
    ......2>         d=it.get('desc');
    ......3>         println(it)
    ......4>         g.V().has('code', c).fold().coalesce(
    ......5>             unfold(),
    ......6>             addV('airport').property('code',c).property('desc',d)
    ......7>         ).iterate()
    ......8> }
    
    CSVRecord [comment=null, mapping={code=0, desc=1}, recordNumber=1, values=[LHR, London Heathrow]]
    CSVRecord [comment=null, mapping={code=0, desc=1}, recordNumber=2, values=[LGW, London Gatwick]]
    CSVRecord [comment=null, mapping={code=0, desc=1}, recordNumber=3, values=[DFW, Dallas Fort Worth]]
    
    gremlin> g.V().count()
    ==>3
    
    gremlin> g.V().valueMap()
    ==>[code:[LHR],desc:[London Heathrow]]
    ==>[code:[LGW],desc:[London Gatwick]]
    ==>[code:[DFW],desc:[Dallas Fort Worth]]      
    
    

    As to your last question, to read the Graph using Python you will need to run a Gremlin Server and connect to it from the Gremlin Console. After loading the file you can then write a Python script that uses the Gremlin Python client to connect to the Gremlin Server.

    You could also just do it all in Python. Either way you will need a Gremlin Server if you want to use Python to access the graph.