Search code examples
cpthreadsfopenfreadpthread-join

reading file line by line using pthreads ... exits unexpectedly


I have the following code:

    /*//not important
    FILE * INFILE;
    list_file = optarg;
    if( ( INFILE = fopen( list_file, "a+" ) ) == NULL ) {
        fprintf( stderr, "Can't open input file\n");
        exit(0);
    }
    */

    pthread_mutex_t input_queue;
    pthread_mutex_init(&input_queue, NULL);

    for( i = 0 ; i < number_thread; i++)
    {
        if( pthread_create( &thread_id[i], NULL, &work, NULL) != 0 )
        {
            i--;
            fprintf(stderr, RED "\nError in creating thread\n" NONE);
        }
    }
    for( i = 0 ; i < number_thread; i++)
        if( pthread_join( thread_id[i], NULL) != 0 )
        {
            fprintf(stderr, RED "\nError in joining thread\n" NONE);
        }




    void * work(void * data)
    {
        unsigned long line;
        char buf[512];
        while ( !feof(INFILE) )
        {
            pthread_mutex_lock(&input_queue);
            fgets((char *)&buf, sizeof(buf), INFILE);
            if (buf[strlen (buf) - 1] == '\n')
                buf[strlen (buf) - 1] = '\0';
            line = (unsigned long)buf;
            pthread_mutex_unlock(&input_queue);
            do_work( line );
        }
        fclose(INFILE);
        return NULL;
    }

it reads lines from file but after a while it exits unexpectedly, no error message. I guess I messed up something.

How can I read the file line by line using pthreads but keep as much as possible the code unchanged (I mean not to mess up the whole program) ?


Solution

  • You are closing INFILE in the first thread that encounters EOF. Afterwards other threads will call feof() — and possibly fclose() — on the closed file, which will corrupt the heap and almost certainly lead to a crash. Also, your newline-chopping code can underrun your buffer at EOF, see remark below.

    To fix the problem, protect feof() and fclose() with the same mutex, and set INFILE to NULL. When the mutex is acquired, check for INFILE being NULL and return immediately if so:

    for (;;) {
      pthread_mutex_lock(&input_queue);
      if (!INFILE) {
        pthread_mutex_unlock(&input_queue);
        break;
      }
      if (feof(INFILE)) {
        INFILE = NULL;
        pthread_mutex_unlock(&input_queue);
        break;
      }
    
      fgets(buf, sizeof(buf), INFILE);
      pthread_mutex_unlock(&input_queue);
    
      // ...strip newline, do_work...
    }
    

    Several remarks:

    • your code writes to buf[strlen(buf) - 1] without checking whether strlen(buf) is zero. buf will be empty at EOF, so this is not a theoretical concern, it will happen exactly once in every execution.

    • line is of type unsigned long, but you are assigning it a pointer value. This will fail on platforms where long does not contain a pointer, such as Win64. Declare line and the argument of do_work as char * (or void * if it must accept other pointer types) instead.

    • avoid calling your mutex a "queue"; in multithreaded programming queue refers to a producer-consumer aware FIFO.

    • you don't need to protect individual stdio functions like fgets with mutexes. They are MT-safe, as mandated by POSIX. (However, in my modified code, fgets() does need to be mutex-protected because INFILE can become invalidated while the mutex is not being held.)

    • (char *) &buf doesn't make sense. Since buf is a char array, it already decays to a pointer to its first member, so you can simply send buf to fgets. If you insist on using the address-of operator, the correct expression is &buf[0].

    • as Carl Norum hinted, feof() is probably not what you want, since it only detects EOF condition already encountered by fgets(). The correct way to check for EOF is by testing whether fgets() returns an empty string — before stripping the newline.