Search code examples
gora

Apache Gora vs mysql applier and sqoop


I want to integrate mysql with a project in hadoop. I searched a lot about different ways, there was two approach: real time using "mysql applier for hadoop" and "apache sqoop" for non real time uses.
I found that Gora has this ability too but I could not find any information about how to do it.
Is Gora real time or not? What is the difference between gora and mysql applier or sqoop?
For integration of hadoop and mysql, does it need any nosql db as interface?


Solution

  • At this moment SQL module of Gora is disabled because of some issues. It does not meet your needs :( Stand by... in future versions will be enabled again.

    Anyway, some explanation about Gora:

    Gora is an Object Mapping (not specifically Relational). We can say it is focused on NoSQL until the SQL module is back on again...

    I find Gora is a good tool to have a NoSQL in the backend and be able to fetch the data in a structured format as an object.

    Is it real time or not? What is the difference between gora and mysql applier or sqoop?

    It is, but I guess it is not what you are thinking. It is not a real-time-automatic-ingest tool, it is not an automatic insert tool, it is not a parser-and-insert, not a filter, not a...

    It is a layer between Hadoop and a configurable datastore (think about something like Hibernate as ORM).

    For integration of hadoop and mysql, does it need any nosql db as interface?

    Integrating it with Hadoop is such easy as configuring Hadoop to use GoraMapper. You get a map being feed with objects (mapped from your configured NoSQL store).

    Soon will be integrated with Pig and Cascading, I think :)

    And my suggestion: if you want to read from/to MySQL, take a look at Pig and Hive, although they are not "real time" (do you mean writing to HDFS instantly after inserting a row in MySQL?).

    I hope this helps.