Search code examples
binaryerlangets

Big binary data share between processes


I have a big binary data iof ip data about Xmb. Processes use binary do some search algorithm to lookup ip address. I have three method. 1. put in ets. but I suppose every read access will copy big binary to process. :( 2. put in gen_server state. processes use gen_server:call to get address.The short coming concurrency. 3. compile binary into beam. but when I compile get eheap_alloc: Cannot allocate 1318267840 bytes of memory (of type "heap")

which the best practice of big data share in erlang?


Solution

  • Binaries over 64 bytes in size are stored as reference counted binaries and their data is stored outside the heap of any process. If such a binary is sent to any process, the underlying data is not duplicated. So, if you store such a binary in an ETS table and then access it from various processes, the underlying data will not be copied, only its reference count will be incremented/decremented. I'd suggest going with the ETS table solution.

    Here's a demonstration of the memory usage at boot, after inserting a 100MB binary into an ETS table, and after fetching a copy of the binary into the shell process. The memory usage does not change after we have a copy binary stored in the shell process. The same would not be true if it was million character string (list of integers) that we were copying in from ETS or another process.

    1> erlang:memory().
    [{total,21912472},
     {processes,5515456},
     {processes_used,5510816},
     {system,16397016},
     {atom,223561},
     {atom_used,219143},
     {binary,844872},
     {code,4808780},
     {ets,301232}]
    2> ets:new(foo, [named_table, set]).
    foo
    3> ets:insert(foo, {foo, binary:copy(<<".">>, 104857600)}).
    true
    4> erlang:memory().
    [{total,127038632},
     {processes,5600320},
     {processes_used,5599952},
     {system,121438312},
     {atom,223561},
     {atom_used,220445},
     {binary,105770576},
     {code,4908097},
     {ets,308416}]
    5> X = ets:lookup(foo, foo).
    [{foo,<<"........................................................................................................"...>>}]
    6> erlang:memory().
    [{total,127511632},
     {processes,6082360},
     {processes_used,6081992},
     {system,121429272},
     {atom,223561},
     {atom_used,220445},
     {binary,105761504},
     {code,4908097},
     {ets,308416}]
    

    You can find a lot more info about how to efficiently work with binaries in Erlang in the link above.