Search code examples
cfile-iofopenfgets

How to open a file of any length in C?


As a school assignment I'm tasked with writing a program that opens any text file and performs a number of operations on the text. The text must be loaded using a linked list, meaning an array of structs containing the char pointer and the pointer to the next struct. One line per struct.

But I'm having problems actually loading the file. It seems the memory required to load the text into memory must be allocated before I actually read the text. Hence I have to open the file several times. Once to count the number of lines, then twice per line; once to count the characters in the line then once to read them. It seems absurd to open a file hundreds of times just to read it into memory.

Obviously there are better ways of doing this, I just don't know them :-)

Examples

  • Can the point from which fgetc fetches a character be moved without re-opening the file?
  • Can the number of lines or characters in a file be checked before it is "opened"?
  • Can I somehow read a line or string from a file and save it to memory without allocating a fixed amount of bytes?
  • etc

Solution

  • There is no need to open the file more than once, nor to pass through it more than once.

    Look at the POSIX getline() function. It reads lines into allocated space. You can use it to read the lines, and then copy the results for your linked list.

    There is no need with a linked list to know how many lines there are ahead of time; that's an advantage of lists.

    So, the code can be done with a single pass. Even if you can't use getline(), you can use fgets() and monitor whether it reads to end of line each time, and if it doesn't you can allocate (and reallocate) space to hold the line as needed (malloc(), realloc() and eventually free() from <stdlib.h>).

    Your specific questions are largely immaterial if you adopt anything of the approach I suggest, but:

    • Using fseek() (and in extremis rewind()) will move the read pointer (for fgetc() and all other functions), unless the 'file' does not support seeking (eg, a pipe provided as standard input).

    • Characters can be determined with stat() or fstat() or variants. Lines cannot be determined except by reading the file.

    • Since the file could be from zero bytes to gigabytes in size, there isn't a sensible way of doing fixed size allocations. You are pretty much forced into dynamic memory allocation with malloc() et al. (Behind the scenes, getline() uses malloc() and realloc().)