Search code examples
mysqlhadoophbasehive

Hadoop with Relational Database


I am new to Hadoop and would like to know Hadoop works in a scenario.

During the creation of Dynamic Web project, I used to store and get data from MySQL database by sending queries from Java/C#.

I use Hadoop services in my Project and does hadoop provide any in build database system where we can store data and retrieve it when required, instead of using an external databases.

Thanks in advance.


Solution

  • Hadoop doesn't provide any builtin DB. It is just 2 things :

    • A distributed FS (HDFS)
    • A distributed processing framework (MapReduce. I'll call it MR in short)

    I'm assuming that you would require very quick response since you are dealing with a web service. IMHO, Hadoop(HDFS to be precise), or any other FS for that matter, won't be a suitable choice in such a scenario. Reason being HDFS lacks the random/read capability, which is very much essential for any web project.

    Same holds true for Hive. Although it manages data in a fashion similar to RDBMSs, it's actually not a RDBMS. The underlying storage mechanism is still HDFS files. Moreover when you issue a Hive query to fetch results, the query first gets converted into a MR job and then produces the result resulting in slow response.

    Your safest bet would be to go with HBase. It is definitely a better choice when you need random, realtime read/write access to your data, as in your case. Although it's not a part of the Hadoop platform, it was built ground up to be used with Hadoop. Works on top of your existing HDFS cluster and can be operated on directly through different HBase APIs(fits in your case) or through MR(not for real time stuff. Fits when you need to batch process huge amounts of data). Easy to setup and use with no requirement of additional infrastructure.

    One important thing to note here is that HBase is a NoSQL DB and doesn't follow RDBMS conventions and terminologies. So, you might have to work a bit on your design initially.

    Apart from HBase you have some other options as well, like Cassandra, which is also a NoSQL DB.

    HTH