Search code examples
phparrayscharacter-encodingtrimspaces

PHP - issue with trim()


Context:

I'm saving a .csv file "keys" (the first line) into an array $CSV to get a multidimensionnal array of the file.
The keys containing multiple words keep their 1st and last spaces as 1st and last character. The file is encoded in Windows-1252 which I convert to UTF-8.

Process:

$keys = mb_convert_encoding($keys, 'UTF-8', 'Windows-1252');
$keys = trim(str_replace('"', ' ', $keys));
$keys = explode(';', $keys);

Results:

here are the firsts 2 keys, the 2nd one keeps its spaces.

Initial process (key => value):

[Commande] => C......
[ Date de création ] => 01/01/1970

Using urlencode(substr($keys[$j], 0, 1)) as value:

[Commande] => C
[ Date de création ] => +

Using rawurlencode(substr($keys[$j], 0, 1)) as value:

[Commande] => C
[ Date de création ] => %20

Using functions I found on other SO questions like preg_replace('/\xc2\xa0/', '', $keys) always outputs %20.

I could skip this issue or work differently but I don't understand why can't I trim() these strings.

Full sample code:

$file = file(__DIR__ . '/path/to/' . $csv_file);
// Keys
$keys = mb_convert_encoding($file[0], 'UTF-8', 'Windows-1252');
$keys = trim(str_replace('"', ' ', $keys));
$keys = explode(';', $keys);

$CSV = [];

for ($i = 1; $i < count($file); $i += 1) {
    $values = explode(';', $file[$i]);
    for ($j = 0; $j < count($values); $j += 1) {
        $values[$j] = mb_convert_encoding($values[$j], 'UTF-8', 'Windows-1252');
        $values[$j] = trim(str_replace('"', ' ', $values[$j]));
        $values = array_combine($keys, $values);
        $CSV[] = $values;
    }
}
die('<pre>' . print_r($CSV, true) . '</pre>');

Solution

  • $keys = trim(str_replace('"', ' ', $keys));
    $keys = explode(';', $keys);
    

    Presumably you're starting with this line:

    Commande;"Date de création";"Something something"
    

    You're then turning it into this line (you're introducing the spaces here):

    Commande; Date de création ; Something something 
    

    Which you're then trimming (removing the spaces at the start and end of the line):

    Commande; Date de création ; Something something
    

    And then you're exploding the line:

    array('Commande', ' Date de création ', ' Something something')
    

    1. You need to trim each individual value after you have exploded the line, not before:

      $keys = array_map('trim', $keys);
      
    2. You should use CSV-parsing functions to parse CSVs, not re-invent the wheel:

      $keys = str_getcsv($file[0], ';');
      
    3. You should parse the entire CSV file using fgetcsv for more efficiency:

      function read_and_convert_line($fh) {
          $values = fgetcsv($fh, 0, ';');
          if (!$values) {
              return false;
          }
      
          return array_map(
              function ($value) { return mb_convert_encoding($value, 'UTF-8', 'Windows-1252'); }, 
              $values
          );
      }
      
      $fh = fopen(__DIR__ . '/path/to/' . $csv_file);
      $headers = read_and_convert_line($fh);
      $data = [];
      
      while ($row = read_and_convert_line($fh)) {
          $data[] = array_combine($headers, $row);
      }
      
      fclose($fh);
      print_r($data);
      

      This should eliminate the need for trim entirely.