Search code examples
phpshellutf-8shell-exechunspell

Call a program via shell_exec with utf-8 text input


Prerequisites: hunspell and php5.

Test code from bash:

user@host ~/ $ echo 'sagadījās' | hunspell -d lv_LV,en_US
Hunspell 1.2.14
+ sagadīties

- works properly.

Test code (test.php):

$encoding = "lv_LV.utf-8";

setlocale(LC_CTYPE, $encoding); // test
putenv('LANG='.$encoding); // and another test

$raw_response = shell_exec("LANG=$encoding; echo 'sagadījās' | hunspell -d lv_LV,en_US");

echo $raw_response;

returns

Hunspell 1.2.14
& sagad 5 0: tagad, sagad?ties, sagaudo, sagand?, sagar?o
*
*

Screenshot (could not post code with invalid characters): Hunspell php invalid characters

It seems that shell_exec cannot handle utf-8 correctly, or maybe some additional encoding/decoding is needed?

EDIT: I had to use en_US.utf-8 to get valid data.


Solution

  • Try this code:

    <?php
    
      // The word we are checking
      $subject = 'sagadījās';
    
      // We want file pointers for all 3 std streams
      $descriptors = array (
        0 => array("pipe", "r"),  // STDIN
        1 => array("pipe", "w"),  // STDOUT
        2 => array("pipe", "w")   // STDERR
      );
    
      // An environment variable
      $env = array(
        'LANG' => 'lv_LV.utf-8'
      );
    
      // Try and start the process
      if (!is_resource($process = proc_open('hunspell -d lv_LV,en_US', $descriptors, $pipes, NULL, $env))) {
        die("Could not start Hunspell!");
      }
    
      // Put pipes into sensibly named variables
      $stdIn = &$pipes[0];
      $stdOut = &$pipes[1];
      $stdErr = &$pipes[2];
      unset($pipes);
    
      // Write the data to the process and close the pipe
      fwrite($stdIn, $subject);
      fclose($stdIn);
    
      // Display raw output
      echo "STDOUT:\n";
      while (!feof($stdOut)) echo fgets($stdOut);
      fclose($stdOut);
    
      // Display raw errors
      echo "\n\nSTDERR:\n";
      while (!feof($stdErr)) echo fgets($stdErr);
      fclose($stdErr);
    
      // Close the process pointer
      proc_close($process);
    
    ?>
    

    Don't forget to verify that the encoding of the file (and therefore the encoding of the data you are passing) actually is UTF-8 ;-)