Search code examples
mapreduceriak

how to do a riak mapred query having multiple map functions


I want to execute a mapreduce query, in erlang, that contains two map phases such that the Map2 function takes the result of the Map1 function as input. Is it possible and if, what must be the return value of each map phase

  • For more details:

I have run a test mapred query using two simple map functions, each one returns the input object (in a list). but by runnin the query I get a badmatch error

Map1 = fun(O,_,_) -> [O] end.
Map2 = fun(O, _,_) -> [O] end.


C:mapred_bucket(<<"b7bc1418-198d-44a3-8835-8aa9cb416d5b">>, [{map, {qfun, Map1}, none, false}, {map, {qfun, Map2}, none, true}]).

{{badmatch,{r_object,<<"b7bc1418-198d-44a3-8835-8aa9cb416d5b">>,
                     <<255,230,193,167,254,7,246,64,154,190,36,236,32,232,189,
                       169,161,124,23,86>>,
                     [{r_content,{dict,2,16,16,8,80,48,
                                       {[],[],[],[],[],[],[],[],[],[],[],...},
                                       {{[],[],[],[],[],[],[],[],[],...}}},
                                 <<"12d33872-4c92-4da5-9d16-5036a8059253">>}],
                     [{<<5,215,86,61>>,{1,63487018636}}],
                     {dict,1,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],...},
                           {{[],[],[],[],[],[],[],[],[],[],...}}},
                     undefined}},
 [{riak_kv_map_phase,build_input,2},
  {riak_kv_map_phase,'-handle_input/3-lc$^0/1-0-',2},
  {riak_kv_map_phase,handle_input,3},
  {luke_phase,executing,2},
  {gen_fsm,handle_msg,7},
  {proc_lib,init_p_do_apply,3}]}

I'm using riak_search-0.14.2

Erlang R14B03 (erts-5.8.4)

thank you!


Solution

  • I'm not sure what the signature of the Map method is in Erlang, as I've only done map/reduce in Javascript, but I'll try to help.

    In order to chain the map phases, only the last map function needs to return a list of objects in Riak. Every other map function above it needs to return a tuple containing the bucket name and the key of the value passed in.

    In Javascript, I've accomplished this like so:

    function map_function(value, keydata, arg) {
        //filtering stuff here
        if(arg.last) {
          data["key"] = value.key;
          return [data];
        }
        else {
          return [[value.bucket, value.key]];
        }
        //this is in the case the filter returns true; if the filter returns false, return an empty tuple
      }
    

    Hope this helps.