Search code examples
cmallocc-strings

Segmentation fault when trying to access properly allocated array


I'm trying to read a CSV file and am writing a function to parse a line of data into an array of strings, which dynamically changes the size of the array and updates size and str_size accordingly. I have written a properly-working function called find_key() to locate the fseek() location of the line in question. I'm coming across a problem that I think relates to the allocation of the string array: I get a segmentation fault on the line at the bottom of the while loop, where it reads data[data_count][str_pos] = curr. The program breaks when I try to access data[0][0], even though as far as I can tell I've allocated memory properly. Any help would be appreciated!

/**
 * @brief Get a row from the provided CSV file by first item. Dynamically
 *        allocated memory to data array
 * 
 * @param file
 * @param key First item of row
 * @param data Array of strings containing data
 * @param size Size of array
 * @param str_size Size of strings in array
 * @return 0 if successful, -1 if the row cannot be found, or 1 otherwise
 */
int csv_get_row(FILE *file, char *key, char **data, size_t *size, size_t *str_size) {
    if(!file || !key) return 1;

    /* Get the position of the beginning of the line starting with the key */
    long pos = find_key(file, key);
    if(pos == -1) return -1;
    fseek(file, pos, SEEK_SET);

    /* If these parameters aren't useful values, assign default values */
    if(*size < 1) *size = DEFAULT_ARRAY_SIZE;
    if(*str_size < 1) *str_size = DEFAULT_BUFFER_SIZE;

    /* If the memory for the array hasn't been allocated, do so now */
    if(!data) data = (char**) malloc(*size * *str_size);

    /* Get characters one-by-one, keeping track of the current amount of elements and the current buffer position */
    size_t data_count = 0;
    size_t str_pos = 0;
    char curr;
    while(fscanf(file, "%c", &curr)) {
        if(data_count >= *size) data = (char**) realloc(data, (*size *= 2) * *str_size);
        if(str_pos >= *str_size) data = (char**) realloc(data, *size * (*str_size *= 2));

        if(curr == ',') {
            data[data_count][str_pos] = '\0';
            data_count++;
            str_pos = 0;
            continue;
        }

        if(curr == '\n') {
            data[data_count][str_pos] = '\0';
            data_count++;
            break;
        }

        data[data_count][str_pos] = curr;
        str_pos++;
    }

    /* Resize the array to fit */
    *size = data_count;
    data = (char**) realloc(data, *size * *str_size);
    return 0;
}

Solution

  • Assume that *size starts at 1. You set data_count to 0. Then, in the first iteration, you do not have data_count >= *size, so you don't realloc(). You increment data_count to 1, though, so in the next iteration, you grow the data buffer to 2.

    So, at this point, data has length 2 and data_count is 1.

    Let's then say we do not go into the first if statement but the second. There, you increment data_count to two. You then access data[data_count], which is one past the last element. That is a problem.

    That might be the problem you see at the end of the while-loop, but it is far from the only problem. Whenever you malloc() or realloc() data, you are invalidating the pointer the caller has, because you might be freeing the memory at the original location. You never give him a pointer to the new data back; all the newly allocated data will be leaked when you return from the function, and the caller must never access data after the call, lest he wants a segfault.