Search code examples
hadoopmapreducegoogle-bigquerygoogle-cloud-platformgoogle-cloud-dataproc

Scheduled mapreduce job on Google Cloud Platform


I'm developing a node.js application that basically stores user event logs in a database and shows insights about user actions. For achieving this event logs must be analyzed by using a Mapreduce job which would run once a day automatically (every night).

I've found lots of tutorials about mapreduce on google cloud web site but I'm totally lost because there are several technologies and can't find a way to do it without using the command line and also there is no information about scheduling (I want that the whole analysis process to be entirely automated)

Please, could you provide me advice about what google technologies should I use or where I can find a good tutorial?

Thank you


Solution

  • You want to be looking at:

    1. Dataproc (run Hadoop/Spark jobs out of the box)
    2. Dataflow (develop 'pipelines' using the Dataflow/Beam programming model)