Search code examples
phpregexcsvfgetcsv

remove double quotes from field


Here is the current format of the csv file I am parsing

"Street","City","Country"
"House # 3, Street "23, H, Block". Building 32", "CityName", "Country"

Here you can see that 23, H, Block is surrounded by double quotation marks and commas in them - when I am parsing this file using the code below

while (! feof($file)) {
    // provide last parameter so in case we get \ in a field it 
    // doesn't break the data
    $row = fgetcsv($file, null, ",", '"', '"');
    // so we don't send anything besides array
    if (count($row) > 0) {
        // if array is empty we don't pass it to further proceeding
        if ($row) {
            $sorted[] = array_merge($rows, $row);
        }
    }
}

The parsing divides 23, h and Block into different elements while they should be one

This is what happens

array:2 [▼
  0 => array:3 [▼
    0 => "Street"
    1 => "City"
    2 => "Country"
  ]
  1 => array:5 [▼
    0 => "House # 3, Street 23"
    1 => " H"
    2 => " Block". Building 32""
    3 => "CityName"
    4 => "Country"
  ]
]

While I want it like this

array:2 [▼
  0 => array:3 [▼
    0 => "Street"
    1 => "City"
    2 => "Country"
  ]
  1 => array:3 [▼
    0 => "House # 3, Street 23, H, Block. Building 32"
    1 => "CityName"
    2 => "Country"
  ]
]

If I can have some regex pattern to remove the unwanted quotation mark from the whole csv file it would be helpful


Solution

  • I believe you should focus on how to correctly split the line/row into tokens instead of removing unwanted double-quote characters from the line.

    The block delimiter has form of "," or ", " thus the regex to split the line would be

    (?<="),\s*(?=")
    

    See DEMO with regex explanation