Search code examples
javamathmapreducemathematical-optimization

For given operations on a large set of data, is there a way to determine if the data can be decomposed into mapreduce operations?


We do stats and such on large sets of data. Right now it is all done on one machine. We're studying the feasibility of moving to a map-reduce paradigm where we decompose the data into subsets, run some operations on that, then combine the results.

Is there any sort of mathematical test that can be applied to a set of operations to determine if the data they operate on can be decomposed?

Or maybe a list somewhere saying what can and cannot be decomposed?

For instance, I didn't think there was a way to decompose standard deviation, but there is...

edit: added tags


Solution

  • Take a look at this paper: http://www.janinebennett.org/index_files/ParallelStatisticsAlgorithms.pdf . They have algorithms for many common statistical problems, and there is open source code available.