Search code examples
javaorientdbetl

orientdb- ETL import vs java custom import with embedded db


I am prototyping a data mining tool to harvest data from multiple sources

1) MySQL db - 2,000,000 vertices 20,000,000 edges 2) custom data files- - 2,000,000 vertices 700,000,000 edges 3) different custom data files - 300000 vertices 500,000,000 edges

From a performance standpoint, is it better to use ETL or custom Java loaders with embedded db?

It is easy to transform the data from the custom data files to CSV or JSON


Solution

  • I'm the ETL maintainer, other than input data format I would take care on which kind of transformation your data sets need AND how many times you need to move data.

    ETL is configurable to do some transformations, and you can use it with a plocal db to achieve maximun performance. If you need to reimport frequently, or very complex transoformations, or if your data format can vary time to time, you can write a custom java program.