Search code examples
curlriaksecondary-indexes

riak: poll a bucket for all items in a bucket by secondary index


Trying to do this via the documentation at their site, but it appears to be outdated.

Have a bucket with a secondary index, and want to curl it to get the JSON associated with all objects that have a certain value for a secondary index.

Looks like the old way to do this was

curl http://localhost:8098/buckets/{bucket}/index/{index}/{value}

but that this is now deprecated.

Attempted to do

curl http://localhost:8098/riak/{bucket}?keys=true&{index}={value}

but that's not working. Any ideas what the correct syntax is here?


Solution

  • You have the deprecation backward, the /riak/{bucket} method is deprecated, /buckets/{bucket} is the current URL scheme.

    Note that your query curl http://localhost:8098/buckets/{bucket}/index/{index}/{value} would return a list of keys but not their values.

    A couple examples:

    first generate some data:

    # for i in {1..1000}; do 
       num=$RANDOM; 
       curl 172.31.0.3:8098/buckets/index_test/keys/key$i -XPUT \
            -H 'content-type: text/plain' \
            -H 'x-riak-index-random_int:'$num \
            -d "{\"value\":$num}"; 
      done
    

    find keys whose random number is 10000-10500:

    # curl 172.31.0.3:8098/buckets/index_test/index/random_int/10000/10500
    
    {"keys":["key334","key93","key51","key232","key427","key177","key504","key813","key472","key618","key405","key558"]}
    

    Get the value that was indexed for each item:

    # curl 172.31.0.3:8098/buckets/index_test/index/random_int/10000/10500\?return_terms=true
    
    {"results":[{"10189":"key334"},{"10089":"key93"},{"10013":"key558"},{"10088":"key405"},{"10057":"key51"},{"10353":"key618"},{"10282":"key472"},{"10194":"key504"},{"10301":"key232"},{"10219":"key813"},{"10311":"key427"},{"10278":"key177"}]}
    

    But to get the actual JSON back, we need to get the value from the KV store:

    # curl 172.31.0.3:8098/buckets/index_test/keys/key334
    
    {"value":10189}
    

    This can be done via MapReduce if the query is not run too often and the nodes can handle the load:

    # curl 172.31.0.3:8098/mapred -XPOST -H 'content-type: application/json' \
      -d '{"inputs":{"bucket":"index_test","index":"random_int","start":10000,"end":10500},'\
    '"query":[{"map":{"language":"javascript",'\
    '"source":"function(Obj){return [Obj.values[0].data]}"}}]}'
    
    ["{value:10189}","{value:10013}","{value:10311}","{value:10278}","{value:10057}","{value:10219}","{value:10353}","{value:10089}","{value:10088}","{value:10301}","{value:10282}","{value:10194}"]
    
    # curl 172.31.0.3:8098/mapred -XPOST -H 'content-type: application/json'\
           -d '{"inputs":{"bucket":"index_test","index":"random_int","start":10000,"end":10500},'\
    '"query":[{"map":{"language":"javascript",'\
    '"source":"function(Obj){return [{\"key\":Obj.key,\"data\":Obj.values[0].data}]}"}}]}'
    
    [{"key":"key558","data":"{value:10013}"},{"key":"key334","data":"{value:10189}"},{"key":"key51","data":"{value:10057}"},{"key":"key427","data":"{value:10311}"},{"key":"key177","data":"{value:10278}"},{"key":"key813","data":"{value:10219}"},{"key":"key93","data":"{value:10089}"},{"key":"key405","data":"{value:10088}"},{"key":"key232","data":"{value:10301}"},{"key":"key618","data":"{value:10353}"},{"key":"key504","data":"{value:10194}"},{"key":"key472","data":"{value:10282}"}]