Search code examples
graphetlorientdb

OrientDB ETL edge lookup from query - how to access $input?


I'm trying my darnedest to do an ETL import from a large dataset that I've been keeping in MongoDB. I've successfully imported the vertices, and I feel like I'm one little syntax misunderstanding away from importing the edges too.

I am pretty sure that the error is in this transformer:

{"edge":{"class":"Friend", "joinFieldName":"id", 
  "lookup": "select from Character WHERE $input.id IN character_friends",
  "unresolvedLinkAction":"CREATE"}},

So what I'm trying to do is to make an edge from a document with id = FOO to all other documents that contain FOO in their character_friends array. If I execute

select from Character WHERE FOO IN character_friends

in the browser, I get a ton of documents, so my guess is that my problem is with $input.id either not returning the id I'm expecting, or maybe not being recognized as a variable at all.

Documents look like this:

{
  id: FOO,
  character_friends: [BAR, BAZ, QUX]
  (and a bunch of other junk)
}

Solution

  • Seems you're inserting a property "id", but it's reserved in Blueprints standard. You can rename it (with "field" transformers) or set this in Orient Loader:

      standardElementConstraints: false,
    

    Then I've created the file /temp/datasets/charles.json with this content:

    [
     {
      name: "Joe",
      id: 1,
      friends: [2,4,5],
      enemies: [6]
     },
     {
      name: "Suzie",
      id: 2,
      friends: [1,4,6],
      enemies: [5,2]
     }
    ]
    

    And this pipeline:

    {
      config: {
        log: "debug",
        parallel: false
      },
      source : {
        file: { path: "/temp/datasets/charles.json", lock : true }
      },
      extractor : {
        json: {}
      },
      transformers : [
        { merge: { joinFieldName:"id", lookup:"Account.id" } },
        { vertex: { class: "Account"} },
        { edge: {
          "class": "Friend",
          "joinFieldName": "friends",
          "lookup": "Account.id",
          "unresolvedLinkAction": "CREATE"
        } },
        { edge: {
          "class": "Enemy",
          "joinFieldName": "enemies",
          "lookup": "Account.id",
          "unresolvedLinkAction": "CREATE"
        } }
      ],
      loader : {
        orientdb: {
          dbURL: "plocal:/temp/databases/charles",
          dbUser: "admin",
          dbPassword: "admin",
          dbAutoDropIfExists: true,
          dbAutoCreate: true,
          standardElementConstraints: false,
          tx: false,
          wal: false,
          batchCommit: 1000,
          dbType: "graph",
          classes: [{name: 'Account', extends:"V"}, {name: 'Friend', extends:"E"}, {name: 'Enemy', extends:"E"}],
          indexes: [{class:"Account", fields:["id:integer"], type:"UNIQUE_HASH_INDEX" }]
        }
      }
    }
    

    Assure to use last version of ETL jar (replace it in $ORIENTDB/lib) with default version. Last version is downloadable from:

    https://oss.sonatype.org/content/repositories/snapshots/com/orientechnologies/orientdb-etl/2.0.2-SNAPSHOT/orientdb-etl-2.0.2-20150208.225903-1.jar

    Or get OrientDB ETL 2.0.2 of major.