Search code examples
c++performanceresumepause

C++ pause/resume system on large operation


I have a C++ program that loads a file with few millions lines and starts processing, the same operation was done by a php script, but in order to reduce the execution time I switched to C++.

In the old script, I checked whether there is a file with the current operation id in a "pause" folder, the file is empty It is just to check if a pause is requested, the script then checks after each 5 iterations if there is such file, if so It stuck on an empty loop until the file is deleted (a.k.a resume) :

foreach($lines as $line)
    {
        $isFinished = $index >= $countData - 1;
        if($index % 5 == 0)
        {
            do
            {
                $isPaused = file_exists("/home/pauses/".$content->{'drop-id'});
            }while($isPaused);
        }
        // Starts processing the line here 
}

But since disk accessing is relatively slow, I don't want to follow the same approach, so I was thinking of some sort of commands that simulates this :

$ kill cpp_program // C++ program returns the last index checked e.g: 37710
$ ./main 37710
$ // cpp_program escapes the first 37709 lines and continues its job

What do you think of this approach ? Is-it feasible ? Is-it non time-consuming ? Is there any better approach ? Thank you

Edit : A clarification because this seems a little ambiguous, this task runs in the background, there is another application which starts this one, I want to be able to send command from the management app (through Linux commands) to the background task to pause/resume.


Solution

  • Jumping to the 37710 line of a text file sadly requires reading all 37710 lines before it on most operating systems.

    On most operating systems, text files are binary files with a convention about newlines. But the OS doesn't cache where the newlines are.

    So to find the newlines, you have to read every byte.

    If your program saved the byte offset of the file it had reached, it could seek to that location, however.

    You can save the state of your program to some config file as you are shutting down, and set it to resume by default when it starts up again. This will require catching the signal you use to shut down, making your main logic notice the signal flag being set, and then cleanly shutting down. It is a very C-esque operation.


    Now, a different traditional way to make a program controllable remotely is to have it listen on a TCP port (and/or stdin) and take command line commands there.

    To go that way, you'd write a REPL component, then hook that up to whatever input and output.

    Either you'd do the REPL in a coroutine like way between processing files, or you'd spawn a separate thread to do REPL and have it communicate asynchronously with the processing thread.

    However, this could be beyond your skill. Each step of this (writing a REPL system, having it not block the main work, responding to commands, then attaching it to a TCP port) would take some effort and learning on your part.