Search code examples
c++boost-threadofstream

ofstream shared by mutiple threads - crashes after awhile


this function takes a ofstream as a reference, then builds struct packets and threads off the structs with the ofstream to a trie matching class. A stack is returned with n matches in order of match distance. The ofstream, packet, and stack are then passed by reference to a print function that writes the matches to the same output file - waiting for when the ofstream is not in use. The problem is the ofstream crashes after half the matches are written.

The ofstream, packet, and outfile header

void Match_Import_Start(Trie & tri)
{
    stack<Trie::MatchesT> out;
    ofstream myfile;
    Trie::MatchesT matchSet;

    myfile.open(outFile.c_str() );
    myfile << "DESCRIPTION,SUGGESTED.DESCRIPTION,EDIT" << "\n"; //write header
    myfile.close();

    Match_Import (tri, myfile, out, matchSet);

    return;
}

spawn threads from record list

void Match_Import(Trie &tri, ofstream &myfile, stack<Trie::MatchesT> out, Trie::MatchesT matchSet)
{   
    out=parse_CSV_file(timeFile); //read in records

    settingT x;

    x.score=0;

    boost::thread_group tgroup; //http://stackoverflow.com/questions/8744279/create-threads-in-a-loop
    while (!out.empty() ) {
        matchSet=out.top();
        out.pop();

        tgroup.create_thread(boost::bind( MaxDistanceCorrections, boost::ref(tri), matchSet, boost::ref(myfile), boost::ref(x) ) );
    }
    tgroup.join_all();

    return;
}

check for valid return from match

void MaxDistanceCorrections(Trie & tri, Trie::MatchesT matchSet, ofstream &myfile, settingT &x)
{
    if (!matchSet.candidateStack.empty() ) ) {
        matchSet.candidateStack.sort(compareCorrMain);
        PrintCorrections(tri, matchSet, myfile, x);
        return;

    } else {        
        tri.suggest(matchSet); //send out to trie match

         if (matchSet.candidateStack.empty() ) { }// modify match parameters

        MaxDistanceCorrections(tri, matchSet, myfile, x);
    }
}

and print when ofstream is available

void PrintCorrections(Trie &tri, Trie::MatchesT &matchSet, ofstream &myfile, settingT &x)
{   
    while (true) {
        if (!myfile.is_open() ) { 
          myfile.open(outFile.c_str(), ios::out | ios::app);
          break;
        }  
     }

    while (!matchSet.candidateStack.empty() ) {
        Trie::CorrectionT corr=matchSet.candidateStack.back();
        matchSet.candidateStack.pop_back();

        const bool flagGood=scoreSuggest (corr); //score
        if (flagGood ) x.score++;

        myfile << matchSet.testCase << "," << corr.match << "," << corr.editDistance << "\n";

    }
    myfile.close();

    return;
}

Fairly new at mutithreading, these functions worked fine as a single thread.

Should the check for ofstream available be placed within the while loop that spins off the candidate matches? Once the ofstream is available then starting the print sequence should tie-up the ofstream form other threads.

Is there a better way to reserve use of an ofstream shared by multiple threads?


Solution

  • This code exhibits undefined behavior from multiple threads. See N3485 27.2.3 [iostreams.threadsafety]/1:

    Concurrent access to a stream object (27.8, 27.9), stream buffer object (27.6), or C Library stream (27.9.2) by multiple threads may result in a data race (1.10) unless otherwise specified (27.4). [ Note: Data races result in undefined behavior (1.10). —end note ]

    In the general case, streams are not safe to use across threads. You must protect the stream using a lock, such as std::mutex.

    Note that even if streams were safe to access across threads this code would probably not do what you want. Consider this line:

    myfile << matchSet.testCase << "," << corr.match << corr.editDistance << "\n";
    

    which is the same as

    myfile << matchSet.testCase;
    myfile << ",";
    myfile << corr.match;
    myfile << corr.editDistance;
    myfile << "\n";
    

    Note the race condition. Let's say that your implementation's operator<< for streams is synchronized for you by the implementation. You still have a potential race in this outer code. For instance, here is one possible execution of this across 2 threads:

       Thread 1                              Thread 2
    ======================================================================
    myfile << matchSet.testCase;
                                             myfile << matchSet.testCase;
    myfile << ",";
                                             myfile << ",";
    myfile << corr.match;
    myfile << corr.editDistance;
    myfile << "\n";
                                             myfile << corr.match;
                                             myfile << corr.editDistance;
                                             myfile << "\n";
    

    Instead of getting that to write as a single line, you'll get output from each thread mixed up together, resulting in gibberish.