I have to parse a CSV file with PHP. The CSV file is provided by the client and I do not have any control over the format. It is comma delimited and uses double quotes as text qualifiers. However, if a field, such as an address field, has a comma in it, the client's system surrounds the field in an additional set of double quotes. For example:
"9999X111","X1110000110105","John Doe",""123 Central Park Avenue, #108"","New York NY 10006 ","","","M","0","1","370.20"
As you can see, the 4th field (3rd index) has an extra set of double quotation marks around the entire field. If I send this string through fgetcsv() or str_getcsv(), the field is not handled correctly.
Unwanted Result Array:
[0] => 9999X111
[1] => X1110000110105
[2] => John Doe
[3] => 555 Central Park Avenue
[4] => #108""
[5] => New York NY 10006
If I remove the extra set of double quotation marks manually, the line is processed correctly using either function; however, I wouldn't be able to do this in a production environment.
Preferred Result Array:
[0] => 9999X111
[1] => X1110000110105
[2] => John Doe
[3] => 555 Central Park Avenue, #108
[4] => New York NY 10006
Here is the current code I am using:
$fileCHG = fopen($fileloc['InputFile'], "r");
$cnt = 0;
while(!feof($fileCHG)) {
$chg[$cnt] = fgetcsv($fileCHG,0,",","\"");
if($chg[$cnt]=="") { //Unset Any Blank Arrays
unset($chg[$cnt]);
}
$cnt++;
}
I have tried a variety of suggestions from all over Stack Overflow, the PHP manual and more and can't seem to get it working. Even if I manually escape the inner set of double quotation marks with a backslash, I still get the incorrect result array. No matter how I play with either function my script will mess up and try to split the field at the comma following "Avenue" and ignores the remaining "".
I feel as if this comment on the PHP site may be explaining what's happening, but as a new coder I am unable to visualize what's actually going on.
http://www.php.net/manual/en/function.fgetcsv.php#58124
I have also tried the following suggestions (out of many) to no avail.
fgetcsv is not splition data properly str_getcsv not parsing the data correctly
This method could have worked; but it requires that the number of fields on each line is the same.
Reading CSV file with unescaped enclosures
I am using PHP 5.3.27 on Mac OS X 10.8.
Thank you in advance for taking a look.
I was able to solve the problem by expanding on the comments left by Daniel and Cosades. Instead of using fgetcsv() to process the line immediately, I use fgets() to store the line in a variable ($line). Then, I used stripos() to find to find the location of every occurrence of a repeating double quote (""). Then, identifying which locations need to be edited by determining if the character before or after is not a comma (,). Below is my new code.
$fileCHG = fopen($fileloc['Charge'], "r");
$cnt = 0;
while(($line=fgets($fileCHG))!==false){
$pos = 0;
while($pos=stripos($line,"\"\"",$pos)){
$chrA = substr($line,$pos-1,1);
$chrB = substr($line,$pos+2,1);
if($chrA!=","){
$line = substr_replace($line,"",$pos+1,1);
}
if($chrB!=","){
$line = substr_replace($line,"",$pos+1,1);
}
$pos = $pos + strlen(",\"\"");
}
if($line!=""){
$chg[$cnt] = str_getcsv($line,",","\"");
}
if($chg[$cnt]==""){
unset($chg[$cnt]);
}
$cnt++;
}
Thanks for pointing me in the right direction!