Search code examples
cgetlinebuffer-overflowgets

Replace deprecated gets()


I am using the SLM toolkit by CMU-Cambridge for some baseline language modeling on language data, but when I run one of the built executables, my system detects a buffer overflow when it tries to execute one of the commands.

Based on this StackOverflow question I noticed that __gets_chk+0x179 caused the problem, and I've found two occurrences of gets/fgets in the source code (evallm.c, also available in this GitHub project someone made) but I do not know how to fix them in a proper/secure way.

The relevant parts of the error message:

*** buffer overflow detected ***: /home/CMU-Cam_Toolkit_v2/bin/evallm terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__gets_chk+0x179)[0x7f613bc719e9]
Aborted

The broken code

# declaration of input string variable
char input_string[500];

# occurence 1
...
while (fgets (wlist_entry, sizeof (wlist_entry),context_cues_fp)) { ... } 
...

# occurence 2
...
while (!feof(stdin) && !told_to_quit) {
    printf("evallm : ");
    gets(input_string);
....

The error basically occurred when the input_string I gave to the evallm command was too long. Normally it is to be called from the command line and you can interactively pass arguments. However, I piped all arguments together with the command (as seen in the example of the docs) but apparently sometimes my argument names where taking too much bytes. When I changed the array length of input_string from 500 to 2000 the problem was solved (so I guess the error was due to occurence 2). But I really would like to fix it by replacing gets() by getline() since it seems to be the right way to go. Or is replacing it by fgets() also a solution? If so, what parameters should I use?

However, when trying to replace gets(), I always get compiling errors. I'm not a C-programmer (Python, Java) and I'm not familiar with the syntax of getline(), so I'm struggling to find the right parameters.


Solution

  • In your particular case, you know that input_string is an array of 500 bytes. (Of course, you could replace that 500 with e.g. 2048)

    I am paranoid, adept of defensive programming, and I would zero that buffer before any input, e.g.

    memset(input_string, 0, sizeof(input_string));
    

    So the buffer is cleared, even when fgets has failed. In most cases that is in principle useless. But you have corner cases and the evil is in the details.

    So read documentation of fgets(3) and replace the gets call with

    fgets(input_string, sizeof(input_string), stdin);
    

    (you actually should handle corner cases, e.g. failure of fgets and input line longer than input_string ....)

    Of course, you may want to zero the terminating newline. For that, add

    int input_len = strlen(input_string);
    if (input_len>0) input_string[input_len-1] = '\0`;
    

    (as commented, you might clear the input_string less often, e.g. at start and on fgets failure)

    Notice that getline(3) is POSIX specific and is managing a heap-allocated buffer. Read about C dynamic memory allocation. If you are unfamiliar with C programming, that might be tricky to you. BTW, you could even consider using the Linux specific readline(3)

    The main point is your familiarity with C programming.

    NB: in C, # does not start a comment, but a preprocessor directive.