Search code examples
apache-sparkhadoopmapreduce

Why spark is 100 times faster than Hadoop Map Reduce


Why spark is faster than Hadoop MapReduce?. As per my understanding if spark is faster due to in-memory processing then Hadoop is also load data into RAM then it process. Every program first load into RAM then it execute. So how we can say spark is doing in-memory processing and why not other big data technology not doing the same. Could you please explain me?


Solution

  • Spark was created out of all the lessons learned from MapReduce. It's not a generation 2, it's redesigned using similar concepts but really learning what was missing/done poorly in map reduce.

    MapReduce partitions data, it reads data, does a map, writes to disk, sends to reducer, which writes it to disk, then reads it, then reduces it, then writes to disk. Lots of writing and reading. If you want to do another operation you start the whole cycle again.

    Spark, tries to keep it in memory, while it does multiple maps/operations, it still does transfer data but only when it has to and uses smart logic to figure out how it can optimize what you are asking it to do. In memory is helpful, but not the only thing it does.