Search code examples
unit-testinghadoopjunitmrunit

MR-Unit vs JUnit for unit testing


Can anyone please explain what is the gain of using MR-Unit for unit testing MR jobs compairing to usage of JUnit and Mockito?

Concretely, what are the things that I can do that I cannot do them using JUnit, or it is much more difficult?

My idea is to move all the logic from mappers/reducers to helper classes and just verify that appropriate methods are called on the mocks.

Why use MR-Unit?


Solution

  • I think the most important thing mrunit gives you is a DSL for testing mapreduce jobs. Unit tests should be about readability and telling a story, so if you have an API which fits the domain it gets easier to write tests and comprehend them later.

    The other maybe equally important thing is that it gives much better assertion errors and diffs than the JUnit default assertions.

    Of course you probably could also just stick to JUnit but you might end up reimplementing most of the mrunit functionally in a half-backed way.

    But it's not either/or, because I see the domain of mrunit a little different. It forces you to think about your jobs in a very simple way: if you put certain things in you want to get certain things out (and maybe some counters increased), whereas JUnit tests often test some kind of logic. So of course you could put and test your logic separately (and probably even should if you have complex logic) and use mrunit for some kind of "blackbox" testing where you don't care where and how the logic is implemented as long as you get the right outputs for your inputs.