Could you share a sample code to convert Wikidata dumps to Gremlin format, please?
I would like to load the converted Gremlin CSV data into AWS Neptune.
As discussed in your other question, Amazon Neptune will happily load that RDF format data directly, but you would need to query it using SPARQL. Unless you absolutely need to get the data into property graph format, loading the data as-is and using SPARQL would get you up and running very quickly.
To use Gremlin or openCypher that data will need to be converted to an equivalent property graph form. You really have a couple of options:
addV
and addE
steps, or openCypher CREATE
and MERGE
clauses.If you have a lot of data to load, the CSV files and bulk loader will be the easier route.
Converting from RDF format to property graph format is very easy. I'm aware of tools that go the other way (CSV to RDF) but not of one that will take a TTL file (let's say) and turn that into CSV.
If you are comfortable writing a little code, all you really need is a Python or Ruby script, then converting this data is quite straightforward. You just have to convert the triple patterns into nodes and edges (with properties).
So, imagine in the RDF you have triples that are essentially in this form
max a dog
fido a dog
max age 3
fido age 6
max likes fido
You would end up creating two nodes, two properties and an edge.
In CSV form the nodes would like like
~id, ~label, age
max,dog,3
fido,dog,6
and the edge would be
~id,~label,~from,~to
e1,likes,max,fido
If you plan on converting all the data, and it is just too much for a script based approach, using a big data ETL approach, such as Spark, is likely the way to go. Many ways to approach this. Not a super hard task. I'm just not aware of a tool that will do it for you (there may be one, but I'm just not aware of anything).