We do stats and such on large sets of data. Right now it is all done on one machine. We're studying the feasibility of moving to a map-reduce paradigm where we decompose the data into subsets, run some operations on that, then combine the results.
Is there any sort of mathematical test that can be applied to a set of operations to determine if the data they operate on can be decomposed?
Or maybe a list somewhere saying what can and cannot be decomposed?
For instance, I didn't think there was a way to decompose standard deviation, but there is...
edit: added tags
Take a look at this paper: http://www.janinebennett.org/index_files/ParallelStatisticsAlgorithms.pdf . They have algorithms for many common statistical problems, and there is open source code available.