Search code examples
cbuffergetline

Why should I use my own buffer for getline or similar functions?


I was writing a program the other day where I used the getline() function and I realized something that I have never thought of before and haven't been able to find anything about it online.

According to the description of getline from the man page:

DESCRIPTION

The getdelim() function reads a line from stream, delimited by the character delimiter. The getline() function is equivalent to getdelim() with the newline character as the delimiter. The delimiter character is included as part of the line, unless the end of the file is reached.

The caller may provide a pointer to a malloced buffer for the line in *linep, and the capacity of that buffer in *linecapp. These functions expand the buffer as needed, as if via realloc(). If linep points to a NULL pointer, a new buffer will be allocated. In either case, *linep and *linecapp will be updated accordingly.

Normally when I use this function I always malloc my own buffer and pass it into the getline funciton but after reading this I realized that this is not necessary as one will just be created.

My question is: Is there any reason why I should create my own buffer and then pass it into getline as opposed to just passing NULL and letting getline handle the buffer?

The only reason I could think of is if you want to exercise control over the size of the buffer but this doesn't seem right because it says it will resize the buffer as necessary.

When should I use my own buffer and when should I let getline handle the creation of the buffer?


Solution

  • Q: Is there any reason why I should create my own buffer and then pass it into getline as opposed to just passing NULL and letting getline handle the buffer?
    A: Typically, no. In some select situations, it makes sense to allocate before calling getline().

    1) Many getline() re-allocation schemes are liner. That is it will allocate a buffer of N bytes (e.g. 256, 1k, 4k). Then if that is not big enough, it will try 2*N, 3*N, 4*N, 5*N, etc. If for some reason, code expects regularly large buffer needs, allotting a single large buffer before calling getline() will prevent getline() from repeated relocations of small buffers. A potential, if dubious, efficiency improvement.

      size_t size = 10000;
      char *buf = mallc(size);
      ssize_t numchar = getline(&buf, &size, ...);
    

    2) Should code need or have a working buffer available before calling getline(), it is fine to use it.

      size_t size = 100;
      char *buf = mallc(size);
      ...
      foo(buf, size);
      ...
      // No need for these steps
      // free(buf);
      // size = 0;
      // buf = NULL;
      ...
      ssize_t numchar = getline(&buf, &size, ...);
      ...
      free(buf);
    

    3) Repeated calls. This include a loop that repeated calls getline(). No need to free within the loop, wait until the loop is done. @Alan Stokes

      // do not use this
      while (some_condition) {
        size_t size = 0;
        char *buf = NULL;
        ssize_t numchar = getline(&buf, &size, ...);
        foo(numchar, buf,size);
        free(buf);
      }  
    
      // instead, use this model
      size_t size = 0;
      char *buf = NULL;
      while (some_condition) {
        ssize_t numchar = getline(&buf, &size, ...);
        foo(numchar, buf,size);
      }  
      free(buf);
    

    Q2: When should I use my own buffer and when should I let getline handle the creation of the buffer?
    A2: Allocate your own buffer when code certainly needs or benefits from it. Else let getline() do it.