Search code examples
cmemory-management

issue erasing elements in a vector using C


I wanted to go "back to the basics" and try writing a C vector implementation. It is using void* to store the data and i try to mimic the C++ counter part a little bit.

I am having difficulty erasing elements. precisely the size of the vector does not seem to match of whats expected after erasing multiple elements.

Here is the implementation of the erase function:

typedef void* vector_iterator;

vector_iterator vector_begin(vector* vec) {
    return vec->data;
}

vector_iterator vector_end(vector* vec) {
    return ((unsigned char*)vec->data) + ((vec->element_size * (vec->size+1))); // "past the last element"
}

void vector_erase(vector* vec, vector_iterator iterator) {
    assert(iterator >= vector_begin(vec));
    assert(iterator < vector_end(vec));
    assert(((uintptr_t)iterator - (uintptr_t)vector_begin(vec)) % vec->element_size == 0);
    unsigned char* dest = (unsigned char*)iterator;
    unsigned char* src = dest + vec->element_size; // src is the element erased element + 1, since we want to pull all objects forward
    size_t bytes_to_copy= (unsigned char*)vector_end(vec) - (unsigned char*)src - vec->element_size;
    memcpy(dest, src, bytes_to_copy); // copy all elements from (iterator +1) forward
    vec->size--;
}

vector_iterator vector_iterator_offset(vector_iterator iterator,vector* vec, ptrdiff_t offset) {
    return (unsigned char*)iterator + (vec->element_size * offset);
}

The usage of erasing is used like this.

vector_erase(vec, vector_iterator_offset(vector_begin(vec),vec,2)); // erase the second element

When I erase 3 elements, the reported size of the vector is correct, but my loop prints size+1 elements.

    vector* vec = vector_create_capacity(sizeof(char), 10);
    //test for push back
    for(char i = 'A'; i < 'A'+10; ++i) {
        vector_push_back(vec, &i);
    }
    
    //...//
    
    fprintf(stdout, "Size: %zu\n",vector_size(vec));
    fflush(stdout);

    vector_erase(vec, vector_iterator_offset(vector_begin(vec),vec,2)); // erase 'C'
    fprintf(stdout, "Size: %zu\n",vector_size(vec));
    fflush(stdout);

    vector_erase(vec, vector_iterator_offset(vector_begin(vec),vec,2)); // erase 'D'
    fprintf(stdout, "Size: %zu\n",vector_size(vec));
    fflush(stdout);

    vector_erase(vec, vector_iterator_offset(vector_begin(vec),vec,2)); // erase 'E'
    fprintf(stdout, "Size: %zu\n",vector_size(vec));
    fflush(stdout);
    
    it = vector_begin(vec);
    for(;it != vector_end(vec); (it = vector_iterator_offset(it, vec, 1))) {
        
        char data = *(char*)it;
        fprintf(stdout,"%c\n", data);
        fflush(stdout);
    }
    fprintf(stdout,"%zu",vector_size(vec));
    fflush(stdout);
    //8 printed letters instead of 7 with double 'J'?
    vector_destroy(vec);

The output at the end is

A
B
F
G
H
I
J
J

Godbolt demo, because the code is very long

is my vector_end(vec) incorrect or is the erasing incorrect?


Solution

  •  // "past the last element"
    

    But size + 1 points to the byte past the next element after the last. So it's not "past the last element", it is "the byte after one plus last element".

    When you are at the vec->data + vec->element_size * vec->size you already are pointing at the byte past the last element. No +1, the size is already the count of elements in the vector, and arrays index from 0.

    is my vector_end(vec) incorrect or is the erasing incorrect?

    Yes, vector_end is confusing. Just:

    vector_iterator vector_end(vector* vec) {
        return ((char *)vec->data) + vec->element_size * vec->size; // "past the last element"
    }
    

    And then naturally you copy the range between end and start.

     size_t bytes_to_copy = (unsigned char*)vector_end(vec) - (unsigned char*)src;
    

    My godbolt link https://godbolt.org/z/z5TvWzM7T .


    Subjective cosmetics:

    • amazing code
    • typedef pointers are still a red flag for me.
      • I would do typedef struct { vector *parent; void *pos; } vector_iterator instead of passing two parameters each time.
      • or at least typedef struct { void *pos; } vector_iterator; so that I get rudimentary type checking
      • but with void * you get nice != comparison, so I understand it's nice to have
    • if you do not plan to dereference it, just use char * to represent byte, no need to type unsigned.
    • so many ( ), they are not needed in some places
    • #include "assert.h" -> #include <assert.h>
    • does vector_create_capacity needs to allocate memory twice and for vector itself? It could just return itself by value vector vector_create_capacity(...).
    • some places are missing NULL allocation error checks, see gcc -fanalyzer complaining
    • overall great code

    Also, I have lately more and more fun with STC library, you can see its vector implementation here https://github.com/stclib/STC/blob/master/docs/vec_api.md .