Here is the current format of the csv file I am parsing
"Street","City","Country"
"House # 3, Street "23, H, Block". Building 32", "CityName", "Country"
Here you can see that 23, H, Block
is surrounded by double quotation marks and commas in them - when I am parsing this file using the code below
while (! feof($file)) {
// provide last parameter so in case we get \ in a field it
// doesn't break the data
$row = fgetcsv($file, null, ",", '"', '"');
// so we don't send anything besides array
if (count($row) > 0) {
// if array is empty we don't pass it to further proceeding
if ($row) {
$sorted[] = array_merge($rows, $row);
}
}
}
The parsing divides 23
, h
and Block
into different elements while they should be one
This is what happens
array:2 [▼
0 => array:3 [▼
0 => "Street"
1 => "City"
2 => "Country"
]
1 => array:5 [▼
0 => "House # 3, Street 23"
1 => " H"
2 => " Block". Building 32""
3 => "CityName"
4 => "Country"
]
]
While I want it like this
array:2 [▼
0 => array:3 [▼
0 => "Street"
1 => "City"
2 => "Country"
]
1 => array:3 [▼
0 => "House # 3, Street 23, H, Block. Building 32"
1 => "CityName"
2 => "Country"
]
]
If I can have some regex pattern to remove the unwanted quotation mark from the whole csv file it would be helpful
I believe you should focus on how to correctly split the line/row into tokens instead of removing unwanted double-quote characters from the line.
The block delimiter has form of ","
or ", "
thus the regex to split the line would be
(?<="),\s*(?=")
See DEMO with regex explanation