Here is the code that I used to scrape the name and the url but every name starts with ~
.
I would like to delete the ~
part. I've tried using str_replace
but it doesn't seem right from looking at it. (also tested it and same result)
foreach ($div_category as &$div){
$a_list = $div->find("a");
foreach ( $a_list as &$anchor){
//put the data into an array and then write array out to a csv file.
$csv_array=array($anchor->plaintext, $anchor->getAttribute("href") );
$anchor = str_replace( '~', ' ', $anchor);
fputcsv($csv_out, $csv_array);
current result example:
name url
~john www.john.com
~bob www.bob.com
~rob www.rob.com
<?php
$str = "~~~~~~";
$str = str_replace("~","!",$str);
echo $str;
?>
Works for me. So the replacing part should be fine. You must be addressing the attribute of your anchor 'incorrectly'. Try printing the anchor with:
print_r($anchor)
to see what attribute you should be using
EDIT:
foreach ($div_category as &$div){
$a_list = $div->find("a");
foreach ( $a_list as &$anchor){
//put the data into an array and then write array out to a csv file.
-> $csv_array=array($anchor->plaintext, $anchor->getAttribute("href") ); // line X
-> $anchor = str_replace( '~', ' ', $anchor); // line Y
fputcsv($csv_out, $csv_array);
The problem is the order of the X and Y lines marked with arrows. Switch them around and it should be working.
EDIT2:
and
$anchor = str_replace( '~', ' ', $anchor);
should be
$anchor->plaintext = str_replace( '~', '', $anchor->plaintext);