Search code examples
ciostdinfgets

Reading an unknown number of lines with unknown length from stdin


I'm relatively new to programming in C and am trying to read input from stdin using fgets.

To begin with I thought about reading max 50 lines, max 50 characters each, and had something like:

int max_length = 50;
char lines[max_length][max_length];
char current_line[max_length];
int idx = 0;

while(fgets(current_line, max_length, stdin) != NULL) {
    strcopy(lines[idx], current_line);
    idx++;
}

The snippet above successfully reads the input and stores it into the lines array where I can sort and print it.

My question is how do I deal with an unknown number of lines, with an unknown number of characters on each line? (bearing in mind that I will have to sort the lines and print them out).


Solution

  • While there are a number of different variations of this problem already answered, the considerations of how to go about it could use a paragraph. When faced with this problem, the approach is the same regardless of which combination of library or POSIX functions you use to do it.

    Essentially, you will dynamically allocate a reasonable number of characters to hold each line. POSIX getline will do this for you automatically, using fgets you can simply read a fixed buffer full of chars and append them (reallocating storage as necessary) until the '\n' character is read (or EOF is reached)

    If you use getline, then you must allocate memory for, and copy the buffer filled. Otherwise, you will overwrite previous lines with each new line read, and when you attempt to free each line, you will likely SegFault with double-free or corruption as you repeatedly attempt to free the same block of memory.

    You can use strdup to simply copy the buffer. However, since strdup allocates storage, you should validate successful allocation before assigned a pointer to the new block of memory to your collection of lines.

    To access each line, you need a pointer to the beginning of each (the block of memory holding each line). A pointer to pointer to char is generally used. (e.g. char **lines;) Memory allocation is generally handled by allocating some reasonable number of pointers to begin with, keeping track of the number you use, and when you reach the number you have allocated, you realloc and double the number of pointers.

    As with each read, you need to validate each memory allocation. (each malloc, calloc, or realloc) You also need to validate the way your program uses the memory you allocate by running the program through a memory error check program (such as valgrind for Linux). They are simple to use, just valgrind yourexename.

    Putting those pieces together, you can do something similar to the following. The following code will read all lines from the filename provided as the first argument to the program (or from stdin by default if no argument is provided) and print the line number and line to stdout (keep that in mind if you run it on a 50,000 line file)

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define NPTR 8
    
    int main (int argc, char **argv) {
    
        size_t ndx = 0,             /* line index */
            nptrs = NPTR,           /* initial number of pointers */
            n = 0;                  /* line alloc size (0, getline decides) */
        ssize_t nchr = 0;           /* return (no. of chars read by getline) */
        char *line = NULL,          /* buffer to read each line */
            **lines = NULL;         /* pointer to pointer to each line */
        FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    
        if (!fp) {  /* validate file open for reading */
            fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
            return 1;
        }
    
        /* allocate/validate initial 'nptrs' pointers */
        if (!(lines = calloc (nptrs, sizeof *lines))) {
            fprintf (stderr, "error: memory exhausted - lines.\n");
            return 1;
        }
    
        /* read each line with POSIX getline */
        while ((nchr = getline (&line, &n, fp)) != -1) {
            if (nchr && line[nchr - 1] == '\n') /* check trailing '\n' */
                line[--nchr] = 0;               /* overwrite with nul-char */
            char *buf = strdup (line);          /* allocate/copy line */
            if (!buf) {     /* strdup allocates, so validate */
                fprintf (stderr, "error: strdup allocation failed.\n");
                break;
            }
            lines[ndx++] = buf;     /* assign start address for buf to lines */
            if (ndx == nptrs) {     /* if pointer limit reached, realloc */
                /* always realloc to temporary pointer, to validate success */
                void *tmp = realloc (lines, sizeof *lines * nptrs * 2);
                if (!tmp) {         /* if realloc fails, bail with lines intact */
                    fprintf (stderr, "read_input: memory exhausted - realloc.\n");
                    break;
                }
                lines = tmp;        /* assign reallocted block to lines */
                /* zero all new memory (optional) */
                memset (lines + nptrs, 0, nptrs * sizeof *lines);
                nptrs *= 2;         /* increment number of allocated pointers */
            }
        }
        free (line);                    /* free memory allocated by getline */
    
        if (fp != stdin) fclose (fp);   /* close file if not stdin */
    
        for (size_t i = 0; i < ndx; i++) {
            printf ("line[%3zu] : %s\n", i, lines[i]);
            free (lines[i]);            /* free memory for each line */
        }
        free (lines);                   /* free pointers */
    
        return 0;
    }
    

    If you don't have getline, or strdup, you can easily implement each. There are multiple examples of each on the site. If you cannot find one, let me know. If you have further questions, let me know as well.