I'm working with a php array which contains some values parsed from a previous scraping process (using Simple HTML DOM Parser
). I can normally print
/ echo
the values of this array, which contains special chars é,à,è
, etc. BUT, the problem is the following :
When I'm using fwrite
to save values in a .csv file, some characters are not successfully saved. For example, Székesfehérvár
is well displayed on my php view in HTML
, but saved as Székesfehérvár
in the .csv
file which I generate with the php script above.
I've already set-up several things in the php script :
iconv
and mb_encode
methods in different places in the codeHere's a part of the script, it is the part who is writing values in a .csv
file
<?php
$data = array(
array("item1", "item2"),
array("item1", "item2"),
array("item1", "item2"),
array("item1", "item2")
// ...
);
//filename
$filename = 'myFileName.csv';
foreach($data as $line) {
$string_txt = ""; //declares the content of the .csv as a string
foreach($line as $item) {
//writes a new line of the .csv
$line_txt = "";
//each line of the .csv equals to the values of the php subarray, tab separated
$line_txt .= $item . "\t";
}
//PHP endline constant, indicates the next line of the .csv
$line_txt .= PHP_EOL;
//add the line to the string which is the global content of the .csv
$line_txt .= $string_txt;
}
//writing the string in a .csv file
$file = fopen($filename, 'w+');
fwrite($file, $string_txt);
fclose($file);
I am currently stuck because I can't save values with accentuated characters correctly.
The solution (provided by @misorude) :
When scraping HTML contents from webpages, there is a difference between what's displayed in your debug and what's really scraped in the script. I had to use html_entity_decode
to let PHP interpret the true value of the HTML code I've scraped, and not the browser's interpretation.
To validate a good retriving of values before store them somewhere, you could try a console.log in JS to see if values are correctly drived :
PHP
//decoding numeric HTML entities who represents "Sóstói Stadion"
$b = html_entity_decode("Sóstói Stadion");
Javascript (to test):
<script>
var b = <?php echo json_encode($b) ;?>;
//print "Sóstói Stadion" correctly
console.log(b);
</script>