Search code examples
carrayscsvreturnstructure

Function to read csv and return a 2d array in C


I just started with C and I've been trying to figure out this all day and it's driving me crazy. I'm trying to create a function to read a CSV file like this one:

10190935A;Sonia;Arroyo;Quintana;M;70
99830067Q;Josefa;Cuenca;Orta;M;42
28122337F;Nuria;Garriga;Dura;M;43
03265079E;Manuel;Orts;Robles;H;45

And create a 2D array and return it to use it later in other functions. This is the function:

void cargarPacientes ()
{

    FILE *file = fopen (".\pacientes.csv", "r");
    char buffer[256 * 500];
    char *arrayOfLines[500];
    char *line = buffer;
    size_t buf_siz = sizeof (buffer);
    int i = 0, n;

    while (fgets (line, buf_siz, file)) {
        char *p = strchr (line, '\n');
        if (p) {
            *p = '\0';
        } else {
            p = strchr (line, '\0');
        }
        arrayOfLines[i++] = line;
        buf_siz -= p - line + 1;
        if (p + 1 == buffer + sizeof (buffer)) {
            break;
        }
        line = p + 1;
    }
    fclose (file);
    n = i;
    int y = 0;
    char *pacientes[20][6];
    for (i = 0; i < n; ++i) {
        char *token;
        char *paciente[6];
        int x = 0;
        token = strtok (arrayOfLines[i], ";");
        while (token != NULL) {
            paciente[x] = token;
            pacientes[y][x] = token;
            token = strtok (NULL, ";");
            x++;
        }
        y++;
    }
    // return pacientes;
}

I also tried using structures, but I really don't know how do they work. This is the structure:

struct Paciente {
    char dni[9];
    char nombre[20];
    char primerApellido[20];
    char segundoApellido[20];
    char sexo[1];
    int edad;
};

There's anyway to return the array from that function or there's any other way to do the same in an easier way? I've also tried this, but I'm having problems, can't even compile.

    void cargarPacientes(size_t N, size_t M, char *pacientes[N][M]
    void main(){
        char *pacientes[20][6];
        cargarPacientes(20, 6, pacientes);
    }

These are the compiler errors (sorry they are in spanish):

C:\Users\Nozomu\CLionProjects\mayo\main.c(26): error C2466: no se puede asignar una matriz de tama¤o constante 0
C:\Users\Nozomu\CLionProjects\mayo\main.c(26): error C2087: 'pacientes': falta el sub¡ndice
C:\Users\Nozomu\CLionProjects\mayo\main.c(88): warning C4048: sub¡ndices de matriz distintos: 'char *(*)[1]' y 'char *[20][6]'

Solution

  • If I understand that you want to read your file and separate each line into a struct Paciente, then the easiest way to do so is to simply allocate a block of memory containing some anticipated number of struct Paciente, fill each with the data read from your file keeping track of the number of struct filled. When the number of struct filled equals the number you have allocated, you simply realloc to increase the number of struct available and keep going...

    This is made easier by the fact that your struct Paciente contains members that are fully defined and don't need any further allocation individually.

    The basic approach is straight-forward. You will allocate a block of memory in cargarPaciente() to hold each struct read from the file. You will take a pointer as a parameter and you update the value at that memory location with the number of struct you have filled. You return a pointer to your allocated block of memory containing your struct elements making them available back in the caller and you have the number of struct filled available through the pointer you passed as a parameter.

    You also generally want to pass an open FILE* pointer as a parameter to your function for reading data from. (If you can't successfully open the file back in the caller, then there is no reason to make the function call to fill your struct in the first place). Changing your function call slightly to accommodate the open FILE* pointer and the pointer to the number of struct filled, you could do:

    struct Paciente *cargarPacientes (FILE *fp, size_t *n)
    

    (or after creating a typedef to your struct for convenience [see below], you could do)

    Paciente *cargarPacientes (FILE *fp, size_t *n)
    

    Looking at the setup to read your file, in main() you would want to declare a pointer to struct, a variable to hold the count of the number of struct read, and a FILE* pointer to your file stream, e.g.

    int main (int argc, char **argv) {
    
        Paciente *pt = NULL;
        size_t n = 0;
        /* use filename provided as 1st argument (stdin by default) */
        FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
        ...
        pt = cargarPacientes (fp, &n);  /* call function assign allocated return */
    

    Other than the validations on the fopen and on the return of cargarPacientes(), that is all you need in main().

    The work will be done in your cargarPacientes() function. To begin, simply declare a buffer large enough to hold each line, your variable to track the number of struct allocated, and then a pointer to the block of memory holding your collection of struct Paciente. (MAXC is defined as a constant of 1024 and NP defined as 2 to allocate storage for 2 struct Paciente initially)

    Paciente *cargarPacientes (FILE *fp, size_t *n)
    {
        char buf[MAXC];     /* buffer to read each line */
        size_t npt = NP;    /* no. Paciente struct allocated */
        Paciente *pt = malloc (npt * sizeof *pt);   /* allocate initial NP struct */
    

    As with any allocation, before you make use of the block you have allocated, always validate the allocation succeeded, e.g.

        if (!pt) {          /* validate every allocation */
            perror ("malloc-pt");
            return NULL;
        }
    

    (note: on error, your function returns NULL instead of the address of an allocated block of memory to indicate failure)

    Now simply read each line and parse the semi-colon separated values into a temporary struct. This allows you to validate you were able to parse the values into the individual members of the struct before assigning the struct to one of the allocated struct in the block of memory you allocated e.g.

        while (fgets (buf, MAXC, fp)) {     /* read each line into buf */
            Paciente tmp = { .dni = "" };   /* temp struct to hold values */
            /* parse line into separate member values using sscanf */
            if (sscanf (buf, FMT, tmp.dni, tmp.nombre, tmp.primerApellido,
                        tmp.segundoApellido, &tmp.sexo, &tmp.edad) == 6) {
    

    note: FMT is define above as a string literal and also note the size of ndi has increased from 9-char to 10-char so it can be treated as a string value and sexo has been declared as a single char instead of an array of char [1], e.g.

    #define FMT "%9[^;];%19[^;];%19[^;];%19[^;];%c;%d"
    

    If you successfully parse the line of data into your temporary struct, you next check if the number of struct you have filled equals the number allocated, and if so, realloc the amount of memory available. (you can add as little as 1 additional struct [inefficient] or you can scale the amount of memory allocated by some factor - here we just double the number of struct allocated beginning from 2)

                /* check if used == allocated to check if realloc needed */
                if (*n == npt) {
                    /* always realloc using temporary pointer */
                    void *ptmp = realloc (pt, 2 * npt * sizeof *pt);
                    if (!ptmp) {    /* validate every realloc */
                        perror ("realloc-pt");
                        break;
                    }
                    pt = ptmp;      /* assign newly sized block to pt */
                    npt *= 2;       /* update no. of struct allocated */
                }
    

    (note: you must realloc using a temporary pointer because if realloc fails it returns NULL which if you assign to your original pointer creates a memory leak due to the loss of the address of the original block of memory that can now no longer be freed)

    All that remains is assigning your temporary struct to the allocated block of memory and updating the number filled, e.g.

                pt[(*n)++] = tmp;   /* assign struct to next struct */
            }
        }
    

    That's it, return the pointer to your allocated block and you are done:

        return pt;  /* return pointer to allocated block of mem containing pt */
    }
    

    To avoid sprinkling Magic-Numbers throughout your code and to avoid Hardcoding Filenames, a set of constants are defined for 2, 10, 20, 1024 using a global enum. You could accomplish the same thing using individual #define statements for each, the global enum is just convenient for defining multiple integer constants in a single line.

    enum { NP = 2, DNI = 10, NAME = 20, MAXC = 1024 };
    
    #define FMT "%9[^;];%19[^;];%19[^;];%19[^;];%c;%d"
    

    Now you no longer have individual numbers in your struct definition and changing the constant and FMT string is all that is required if you need to change the size of any of the members of your struct (you cannot use constants or variables in the sscanf format string, so individual numbers are always required there.

    typedef struct Paciente {
        char dni[DNI];
        char nombre[NAME];
        char primerApellido[NAME];
        char segundoApellido[NAME];
        char sexo;
        int edad;
    } Paciente;
    

    To avoid hardcoding the filename, we take the filename to read from as the first argument to your program (or read from stdin if no argument is provided). This avoids having to recompile your program every time the name of your input file changes.

    Putting it altogether you could do:

    #include <stdio.h>
    #include <stdlib.h>
    
    enum { NP = 2, DNI = 10, NAME = 20, MAXC = 1024 };
    
    #define FMT "%9[^;];%19[^;];%19[^;];%19[^;];%c;%d"
    
    typedef struct Paciente {
        char dni[DNI];
        char nombre[NAME];
        char primerApellido[NAME];
        char segundoApellido[NAME];
        char sexo;
        int edad;
    } Paciente;
    
    Paciente *cargarPacientes (FILE *fp, size_t *n)
    {
        char buf[MAXC];     /* buffer to read each line */
        size_t npt = NP;    /* no. Paciente struct allocated */
        Paciente *pt = malloc (npt * sizeof *pt);   /* allocate initial NP struct */
    
        if (!pt) {          /* validate every allocation */
            perror ("malloc-pt");
            return NULL;
        }
    
        while (fgets (buf, MAXC, fp)) {     /* read each line into buf */
            Paciente tmp = { .dni = "" };   /* temp struct to hold values */
            /* parse line into separate member values using sscanf */
            if (sscanf (buf, FMT, tmp.dni, tmp.nombre, tmp.primerApellido,
                        tmp.segundoApellido, &tmp.sexo, &tmp.edad) == 6) {
                /* check if used == allocated to check if realloc needed */
                if (*n == npt) {
                    /* always realloc using temporary pointer */
                    void *ptmp = realloc (pt, 2 * npt * sizeof *pt);
                    if (!ptmp) {    /* validate every realloc */
                        perror ("realloc-pt");
                        break;
                    }
                    pt = ptmp;      /* assign newly sized block to pt */
                    npt *= 2;       /* update no. of struct allocated */
                }
                pt[(*n)++] = tmp;   /* assign struct to next struct */
            }
        }
    
        return pt;  /* return pointer to allocated block of mem containing pt */
    }
    
    int main (int argc, char **argv) {
    
        Paciente *pt = NULL;
        size_t n = 0;
        /* use filename provided as 1st argument (stdin by default) */
        FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    
        if (!fp) {  /* validate file open for reading */
            perror ("file open failed");
            return 1;
        }
    
        pt = cargarPacientes (fp, &n);  /* call function assign allocated return */
        if (!pt) {  /* validate the return was no NULL */
            fputs ("cargarPacientes-empty\n", stderr);
            return 1;
        }
    
        if (fp != stdin)   /* close file if not stdin */
            fclose (fp);
    
        for (size_t i = 0; i < n; i++) {    /* output all struct saved in pt */
            printf ("%-9s %-10s %-10s %-10s  %c  %d\n", pt[i].dni, pt[i].nombre,
                    pt[i].primerApellido, pt[i].segundoApellido, pt[i].sexo,
                    pt[i].edad);
        }
    
        free (pt);    /* don't forget to free the memory you have allocated */
    }
    

    Example Use/Output

    With your sample data in the file dat/patiente.csv, the program produces the following output:

    $ ./bin/readpatiente dat/patiente.csv
    10190935A Sonia      Arroyo     Quintana    M  70
    99830067Q Josefa     Cuenca     Orta        M  42
    28122337F Nuria      Garriga    Dura        M  43
    03265079E Manuel     Orts       Robles      H  45
    

    Memory Use/Error Check

    In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

    It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

    For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

    $ valgrind ./bin/readpatiente dat/patiente.csv
    ==1099== Memcheck, a memory error detector
    ==1099== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
    ==1099== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
    ==1099== Command: ./bin/readpatiente dat/patiente.csv
    ==1099==
    10190935A Sonia      Arroyo     Quintana    M  70
    99830067Q Josefa     Cuenca     Orta        M  42
    28122337F Nuria      Garriga    Dura        M  43
    03265079E Manuel     Orts       Robles      H  45
    ==1099==
    ==1099== HEAP SUMMARY:
    ==1099==     in use at exit: 0 bytes in 0 blocks
    ==1099==   total heap usage: 5 allocs, 5 frees, 6,128 bytes allocated
    ==1099==
    ==1099== All heap blocks were freed -- no leaks are possible
    ==1099==
    ==1099== For counts of detected and suppressed errors, rerun with: -v
    ==1099== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    

    Always confirm that you have freed all memory you have allocated and that there are no memory errors.

    This is much simpler than trying to hardcode fixed 2D arrays in attempt to handle parsing the values from the file. Look things over and let me know if you have further questions.