(C) realloc array modifies data pointed by items
Hello,
A nice weird bug I feel like sharing ;-) Requires some preliminary explanations:
First, I have a type of strings PString
which hold their size (and a hash value), followed by a flexible array member with the bytes. Here is the type and kind of constructor (the printfl statement at the end is debug):
typedef struct {
size_t size;
uint hash;
char bytes[];
} PString;
// offset from start of pstring struct to start of data bytes:
static const size_t PSTRING_OFFSET = sizeof(size_t) + sizeof(uint);
PString * pstring_struct (string str, size_t size, uint hash) {
// memory zone
char *mem = malloc(PSTRING_OFFSET + size * sizeof(char));
check_mem(mem);
// string data bytes:
memcpy(mem + PSTRING_OFFSET, str, size);
mem[PSTRING_OFFSET + size] = NUL;
// pstring struct:
PString * pstr = (PString *) mem;
pstr->size = size;
pstr->hash = hash;
printfl("*** str:'%s' (%u) --> pstr:'%s' (%u) 0x%X",
str, size, pstr->bytes, pstr->size, pstr); ///////////////////////
return pstr;
}
[Any comment on this construction welcome: I'm not sure at all to do things right, here. It's the first time I use flexible array members, and I could not find exemples of using them in allocated structs.]
Second, those pstrings are stored in a string pool, meaning a set implemented as hash table. As usual, "buckets" for collisions (after hash & modulo) are plain linked lists of cells, each holding a pstring pointer and a pointer to next cell. The only special detail is that the cells themselves are stored in an array, instead of beeing allocated anywhere on the heap [1]. Hope the picture is clear. Here is the definition of Cell
:
typedef struct SCell {
PString * pstr;
struct SCell * next;
} Cell;
All seemed to work fine, including a battery of tests of the pool itself. Now, when testing a pstring routine (search), I noticed a string changed. After some research, I finally guessed the problem is related to pool growing, and endly could reduce the issue exactly around the growing of the array of cells (so, well before redistributing cells into lists). Here is the lines of debug prints around this growing, with copy of the show_pool
routine producing the output (just shows the strings), and the output itself:
static void pool_grow (StringPool * pool, uint n_new) {
...
// Grow arrays:
show_pool(pool); /////////////////////
pool->cells = realloc(pool->cells, pool->n_cells * sizeof(Cell));
check_mem(pool->cells);
show_pool(pool); ////////////////////
...
static void show_pool (StringPool * pool) {
if (pool->n == 0) {
printfl("{}");
return;
}
printf("pool : {\"%s\"", pool->cells[0].pstr->bytes);
PString * pstr;
uint i;
for (i = 1; i < pool->n; i++) {
pstr = pool->cells[i].pstr;
printf(", \"%s\"", pstr->bytes);
}
printl("}");
}
// output:
pool : {"", "abc", "b", "abcXXXabcXXX"}
pool : {"", "abc", "b", "abcXXXabcXXXI"}
As you can see, the last string stored has an additional byte 'I'. Since in the meanwhile I'm just calling realloc, I find myself a bit blocked for further debugging; and thinking hard does not help in throwing light on this mystery. (Note that cells just hold pstring pointers, so how can growing the array of cells alter the string bytes?) Also, I'm bluffed by the fact there seems to be a quite convenient NUL just after the mysterious 'I', since printf halts there.
Thank you. Can you help?
[1] There is no special reason for doing that here, with a string pool. I usually do that to get for free an ordered set or map, and in addition locality of reference. (The only overhead is that the array of cells must grow in addition to the array of buckets, but one can reduce the number of growings by predimensioning.)
Since size
doesn't include the null terminator,
mem[PSTRING_OFFSET + size] = NUL;
is invalid. Every other issue stems from this.