I need to read some data from a .csv file encoded as ISO-8859-1 and putting it's content in a PostgreSQL database encoded as UTF-8, and I'm getting two errors, depending on the row of the file I'm dealing.
I'm getting data from the file using fgetcsv() function:
while (($line = fgetcsv($handle,0,';','"')) !== false) {
The errors are "Undefined offset" on a line where I call a function like this:
$foo = my_function($file_line[$index]);
The error "invalid byte sequence for encoding UTF8" occurs when I try to insert data into my PostgreSQL table.
The file contains complex data, including date fields, number fields and multi-line text fields with special characters and accents and all lines have all fields even if empty.
The error "undefined offset" was happenig because the array didn't have the index informed. But it should have (all file lines has all fields).
The real problem was in the fgetcsv function, I didn't set an "escape character" and in the text there was a slash (the default escape character) in the end of a field, this was making the fgetcsv function not to consider the delimiter character, messing up with the array data.
This was solved by setting an unusual character as "escape chararcter", like this:
while (($line = fgetcsv($handle,0,';','"','')) !== false) {
The "invalid byte sequence for encoding UTF8" error was solved by removing all unconventional characters from data and then convert to UTF-8 with these commands:
$field_content = preg_replace('/[^\x{0020}-\x{007E}\x{00c0}-\x{00fd}\x{000a}\x{0009}]/', '',$field_content);
$field_content = utf8_encode($field_content);