Search code examples
clinked-listoverhead

How is unrolled linked list better than linked list?


I was studying the book "Data Structure and algorithms made easy" and but I got confused while learning "Comparing Linked Lists and Unrolled Linked list"...

what is overhead? Why he is only stating 8 bytes of overhead for 100 elements array?

Para from book


Solution

  • I think the book contains a serious error in the definition of struct LinkedBlock. Let's get back to that later and start with:

    what is overhead?

    The struct ListNode is designed for storing one integer but besides the integer each node has two pointers. So for each node you'll need to allocate 1 integer + 2 pointers. Let's assume 4 byte integer and 4 byte pointer. So each node will require 4 + 2x4 = 12 bytes. So in order to store 1 item of your real data (aka 1 integer) you need to allocate 12 bytes. You have wasted 8 bytes on pointers. These 8 "wasted" bytes are called overhead. They are used for bookkeeping only - not for data.

    But it get worse than that... When allocating dynamic memory (which you normally do when using linked list) there are some additional overhead. The allocator may need a little extra memory for every malloc to store information about the malloc. Another issue is that malloc ed memory may aligned to some fixed block size (e.g. 16 or 32 byte) so if you allocate 20 byte there is no way to use the remaining 12 bytes - they are wasted. This is what the book calls "allocation overhead". The "allocation overhead" is system dependent but the book assumes 8 extra overhead bytes from each malloc.

    So now each malloc 'ed struct ListNode takes up:

    4 bytes for the integer

    8 bytes for 2 pointers

    8 bytes for allocation overhead

    A total of 20 bytes where 4 bytes is for your data and 16 bytes are overhead. So for each integer you need to store, you'll need 20 bytes. And if you want to store 1000 integers, you end up wasting 16kb on overhead in order to store 4kb of data.

    Now back to the struct LinkedBlock. In the book it looks like this:

    struct LinkedBlock {
        struct LinkedBlock *next;
        struct LinkedNode *head;
        int nodeCount;
    };
    

    I'm pretty sure there is a mistake in the book and that it should look like this instead:

    struct LinkedBlock {
        struct LinkedBlock *next;
        int *dataArray;
        int nodeCount;
    };
    

    The way to use this is something like:

    struct LinkedBlock pNode = malloc(sizeof(struct LinkedBlock));
    pNode->dataArray = malloc( 100 * sizeof(int) );
    

    The first malloc requires 4 + 4 + 4 + 8 = 20 bytes. (pointer, pointer, int, allocation overhead)

    The second malloc requires 4 * 100 + 8 = 408 bytes. (100 int, allocation overhead)

    So a total of 428 bytes.

    However, since the malloc'ed data can hold 100 integers (corresponding to 400 bytes), your overhead is only 28 bytes. In other words - in average you use 4.28 bytes for each integer. Compare that to the first method that required 20 bytes for each integer.

    Why he is only stating 8 bytes of overhead for 100 elements array?

    That was because the array was allocated in a single call and each malloc call is assumed to have 8 bytes allocation overhead.