Search code examples
erlangriak

How to use bitcask as a stand alone


(edit: i miss some reputation to create the bitcask tag so ...)

(tl;dr => bitcask:get/2 doesn't work and raises badarg in bitcask_nifs:keydir_get_int)

I would like to know how to use bitcask without riak the right way.

First, i was trying this:

bitcask:put(Handle, 3, {this, is, data}).
bitcask:get(Handle, 3).

This two calls raise the same error : badarg with erlang:size/1

The problem is erlang:size/1 accepts only binaries or tuples. So i was trying this :

bitcask:put(Handle, {thing, 3}, {this, is, data}).
bitcask:get(Handle, {thing, 3}).

A new badarg error then, with erlang:crc32 and the Value i want to store.

So now i use this code, bucket is the atom name of a registered gen_server which keeps the handle in its state. cask_wrapper is the code for theese gen_servers. The code below is the acces to theese gen servers.

-module(sr_db).
...
get(Type, Key) when not is_binary(Key) ->
    ?MODULE:get(Type, term_to_binary(Key));
get(Type, Key) ->
    Bucket = type2bucket(Type),
    cask_wrapper:get(Bucket, {get, Key}).

put(Type, Key, Data) when not is_binary(Key) ->
    ?MODULE:put(Type, term_to_binary(Key), Data);

put(Type, Key, Data) when not is_binary(Data) ->
    ?MODULE:put(Type, Key, term_to_binary(Data));

put(Type, Key, Data) ->
    Bucket = type2bucket(Type),
    cask_wrapper:put(Bucket, Key, Data),
    ok.
%% syncput(Type, Key, Data) -> call au lieu de cast

type2bucket(user) -> users_cask.

I use this code like this:

sr_db:get(user, 3).
%% then a call is made to cask_wrapper:get(users_cask, {get, 3}).

there are the cask_wrapper functions

get(Bucket, Key) ->
    gen_server:call(Bucket, {get, Key}).

handle_call({get, Key}, _From, State) ->
    Fetch = bitcask:get(State#state.handle, Key),
    {reply, Fetch, State}.

I use the same mechanism with the put function. (but with gen_server:cast)

My first question is : is doing term_to_binary conversion in every call a good practice, or is it slow ? I will have to convert back to erlang terms the values that i fetch.

At the moment, the put operation returns 'ok'. It works. But the get operation doesn't work yet. This is the error:

=ERROR REPORT==== 29-Jan-2012::20:21:24 ===
** Generic server users_cask terminating
** Last message in was {get,{get,<<131,97,3>>}}
** When Server state == {state,#Ref<0.0.0.353>}
** Reason for termination ==
** {badarg,[{bitcask_nifs,keydir_get_int,[<<>>,{get,<<131,97,3>>}]},
            {bitcask_nifs,keydir_get,2},
            {bitcask,get,3},
            {cask_wrapper,handle_call,3},
            {gen_server,handle_msg,5},
            {proc_lib,init_p_do_apply,3}]}
Bitcask dir : "/home/niahoo/src/skyraiders/priv/bitcasks/users"
options : [read_write]** exception exit: {{badarg,
                        [{bitcask_nifs,keydir_get_int,
                             [<<>>,{get,<<131,97,3>>}]},
                         {bitcask_nifs,keydir_get,2},
                         {bitcask,get,3},
                         {cask_wrapper,handle_call,3},
                         {gen_server,handle_msg,5},
                         {proc_lib,init_p_do_apply,3}]},
                    {gen_server,call,[users_cask,{get,{get,<<131,97,3>>}}]}}
     in function  gen_server:call/2

I can't figure out why it does not work and would appreciate some help.

Thank you


Solution

  • Bitcask expects the key and the value both to be binaries (as you already noticed). I don't really know how fast term_to_binary/binary_to_term is, but there is really no way around it if you want to store terms on disk. You could of course roll you own code to convert you keys and values to/from binaries, but I doubt that it will be significantly fast than the builtin functions and certainly less flexible. But at the end of the day you have to measure the profile your application, and decide if term_to_binary/binary_to_term is a hotspot in your total system. I would be very surprised if that is the case in any real application where data has to be written to disk.

    Now to the error when calling sr_db:get/2. You are wrapping the key twice inside a {get, Key} tuple, once inside sr_db:get/2 and another time in cask_wrapper:get/2, but you unwrap it only once, by matching in cask_wrapper:handle_call/3. You can immediately spot this in the error report in those two lines:

    ** Last message in was {get,{get,<<131,97,3>>}}

    and

    {gen_server,call,[users_cask,{get,{get,<<131,97,3>>}}]}}