Search code examples
javaphphadoopmahout

Just how much Java does one need to use Hadoop and Mahout effectively?


I'm a PHP developer. Let's just get that out of the way now. But Hadoop – and Mahout in particular – have piqued my interest. I'm ready to take the dive into Java in order to use them.

So from people experience enough to know, just how much Java will I need to be able to use these effectively? From what I've seen, programming mappers/reducers doesn't take all that much. But with Mahout I'm not at all sure what I'm looking at when I look at the documentation.

Also, just how hard will it be to take data from my PHP application for processing in Java via Hadoop and Mahout? I can't imagine it'd be that difficult, but I'm not experienced enough to say.


Solution

  • It shouldn't be all that difficult to get data from PHP to Java for analysis using Mahout and Hadoop.

    Even easier is to process using Mahout and Hadoop off-line in a batch mode and to store the data products in a file system or database. PHP can then read these data products as easy as falling off a log.

    For real-time use, the recommendations part of Mahout supports a variety of web-service interfaces that make it pretty easy to access from PHP. Hitting the model evaluation part of Mahout would require a bit more programming.