Search code examples
phpsortingexectruncated

php, exec, sort command, output file 'appears' truncated, but input file is not fully read up to EOF


When I use php to

exec('sort /var/www/website/file_in.txt -o /var/www/website/file_out.txt');

the file that is output from the sort command is trunacted

input filesize= 2,442,541

output filesize= 1,146,881


I also noticed, when using php function filesize(file_in.txt), the returned value = 1,146,881, not the correct size as shown in a terminal session. I did call clearstatcache() prior to calling filesize().

Interestingly, filesize reports the size of the file_in as the same value that file_out is truncated to.

I am running a Linux _x86_64 64bit PHP version, so I thought that eliminates the 32bit filesize limitation issue for files gt 2MB.


When I run the sort command in a terminal session as user www-data, the output file is the same size as the input file, no truncation.


I tried writing a shell script to call from exec, hoping it would bypass a possible php buffer limit, but it has the same truncated output file.


I have TOP running in a separate terminal to watch CPU and MEMORY usage, but since the coammdn when run in a terminal does not truncate the output, this appears to be a PHP issue.


Is there any kind of obscure configuration .ini setting that I should check to solve this issue?

Thanks


Additional Info: I realized the output file is not being truncated, but rather, the input file is not being fully read in until EOF.


Solution

  • this appears to have been caused by a lack of available memory at this point of the php script's execution.

    through trial tests, I increased the SORT command's option --buffer-size=4K starting at 4K, 5K, 10K, 40K, but nothing did the trick.

    I was watching TOP to see how the CPU % memory usage was reported.

    I did not think this was a problem so I did not describe it in my original question, but prior to this step of calling SORT thru EXEC, I was calling pdftotext thru EXEC. As that process was running, the server's CPU usage spiked up to 98%. Memory probably also spiked, but TOP's refresh rate did not capture it.

    I imagined that I could add a sleep(5) before the SORT command was called, to pause the PHP script's execution, giving the CPU and Memory spike some time to return to normal. That solved the problem of SORT reading the entire input file and outputting all it's contents. This also solved the incorrect filesize() result.

    In a production environment, I will spin-up a server with more capacity, and try to eliminate the sleep(5) delay. I can't wait until I get to startup level "ramen noodles" :)