There seems to be an encoding issue or bug in PHP with fputcsv() and fgetcsv().
The following PHP code:
$row_before = ['A', json_encode(['a', '\\', 'b']), 'B'];
print "\nBEFORE:\n";
var_export($row_before);
print "\n";
$fh = fopen($file = 'php://temp', 'rb+');
fputcsv($fh, $row_before);
rewind($fh);
$row_after = fgetcsv($fh);
print "\nAFTER:\n";
var_export($row_after);
print "\n\n";
fclose($fh);
Gives me this output:
BEFORE:
array (
0 => 'A',
1 => '["a","\\\\","b"]',
2 => 'B',
)
AFTER:
array (
0 => 'A',
1 => '["a","\\\\',
2 => 'b""]"',
3 => 'B',
)
So clearly, the data is damaged on the way. Originally there were just 3 cells in the row, afterwards there are 4 cells in the row. The middle cell is split thanks to the backslash that is also used as an escape character.
See also https://3v4l.org/nc1oE Or here, with explicit values for delimiter, enclosure, escape_char: https://3v4l.org/Svt7m
Is there any way I can sanitize / escape my data before writing to CSV, to guarantee that the data read from the file will be exactly the same?
Is CSV a fully reversible format?
EDIT: The goal would be a mechanism to properly write and read ANY data as csv, so that after one round trip the data is still the same.
EDIT: I realize that I do not really understand the $escape_char parameter. See also fgetcsv/fputcsv $escape parameter fundamentally broken Maybe an answer to this would also bring us closer to a solution.
The culprit is that fputcsv() uses an escape character, which is a non-standard extension to CSV. (Well, as far as RFC 7111 can be regarded as standard.) Basically, this escape character would have to be disabled, but passing an empty string as $escape to fputcsv() doesn't work. Usually, passing a NUL character should give the desired results, however, see https://3v4l.org/MlluN.