I am developing a C++ program that can run all day. It outputs to stdout and I want to compress this output. The uncompressed output can be many GB. A startup Bourne shell script compiles the C++ code and starts up the program like so:
./prog | gzip > output.gz
When I interrupt the script using CTRL-C, the .gz file is always corrupted. When I start the program from a terminal and interrupt it using CTRL-C, the .gz file is also always corrupted. When I start the program a terminal and terminate it using Linux killall, the .gz file is fine.
On the other hand, on a terminal cat <large_file> | gzip > cat.gz
can be interrupted using CTRL-C and cat.gz is always fine. So I suspect cat has a signal handler of some sort that I have to implement as well in my program in C++... but looking at a cat implementation online, I found nothing like it. Never the less, I implemented this:
void SignalHandler(int aSignum)
{
exit(0);
}
void Signals()
{
signal(SIGINT, SignalHandler);
signal(SIGKILL, SignalHandler);
signal(SIGTERM, SignalHandler);
}
...and even something in the bsh script, but nothing helps. After CTRL-C, the gz file is corrupted.
Questions:
Opening the resulting file using zcat
gives some output, but then:
gzip: file.gz: unexpected end of file
. Opening it in Ubuntu's Archive Manager just gives a popup saying An error occurred while extracting files.
Tried flushing; no change in the problem was observed.
More info about the issue: Missing end (EOCDR) signature
Fix archive (-F) - assume mostly intact archive
zip warning: bad archive - missing end signature
zip warning: (If downloaded, was binary mode used? If not, the
zip warning: archive may be scrambled and not recoverable)
zip warning: Can't use -F to fix (try -FF)
zip error: Zip file structure invalid (file.gz)
maot@HP-Pavilion-dv7:~/temp$ zip -FF file.gz --out file2.gz
Fix archive (-FF) - salvage what can
zip warning: Missing end (EOCDR) signature - either this archive
is not readable or the end is damaged
Is this a single-disk archive? (y/n): y
Assuming single-disk archive
Scanning for entries...
zip warning: zip file empty
maot@HP-Pavilion-dv7:~/temp$ ls -lh file2.gz
-rw------- 1 maot maot 22 feb 15 15:18 file2.gz
maot@HP-Pavilion-dv7:~/temp$
Thanks @Maxim Egorushkin, but it does not work. The interruption of the script by CTRL-C kills prog
before the signal handler of the script is executed. Hence, I can not send it a signal, it's already gone... and without output of SignalHandler
. When prog
is started from the command line, the output of SignalHandler
is observed. Prog:
#include <iostream>
#include <unistd.h>
#include <csignal>
void SignalHandler(int aSignum)
{
std::cout << "prog: Interrupt signal " << aSignum << " received.\n";
fflush(nullptr);
exit(0);
}
int main()
{
for (int sig = 1; sig <=31; sig++)
{
std::cout << " sig " << sig;
signal(sig, SignalHandler);
}
while (true)
{
std::cout << "prog: Sleep ";
fflush(nullptr);
usleep(1e4);
}
}
Script:
#!/bin/sh
onerror()
{
echo "onerror(): Started."
ps -jef | grep prog
killall -s SIGINT prog
exit
}
g++ -Wall prog.cpp -o prog
trap onerror 2
prog | gzip > file.gz
Result:
maot@HP-Pavilion-dv7:~/temp$ test.sh
^Conerror(): Started.
maot 16733 16721 16721 5781 0 16:17 pts/1 00:00:00 grep prog
prog: no process found
maot@HP-Pavilion-dv7:~/temp$
Implementation of the answer of Maxim Egorushkin. Script:
#!/bin/sh
g++ -Wall prog.cpp -o prog
prog | setsid gzip > file.gz & wait
Prog:
#include <iostream>
#include <unistd.h>
#include <csignal>
void SignalHandler(int aSignum)
{
std::cout << "prog: Interrupt signal " << aSignum << " received.\n";
exit(0);
}
int main()
{
signal(SIGINT, SignalHandler);
while (true)
{
std::cout << "prog: Sleep ";
usleep(1e4);
}
}
When you press Ctrl+C the shell sends SIGINT
to the last process in the pipeline, which is gzip
here. gzip
terminates and the next time prog
writes into stdout
it receives SIGPIPE
.
You need to send SIGINT
to prog
for it to flush its stdout
and exit (provided you installed the signal handler as you did), so that gzip
receives all of its output and then terminates.
You can run your pipeline as follows:
prog | setsid gzip > file.gz & wait
It uses shell job control feature to start the pipeline in the background (that &
symbol). Then it wait
s for the job to terminate. On Ctrl+C
SIGINT
is sent to the foreground process which is the shell in wait
and all processes in the same terminal process group (unlike when the pipeline is in the foreground and SIGINT
is sent only to the last process in the pipeline). prog
is in that group. But gzip
is started with setsid
to place it into another group, so that it doesn't receive SIGINT
but rather terminates when its stdin
is closed when prog
terminated.