Search code examples
phpstringweb-scrapingreplace

How do you delete one part in the plaintext I scraped from the site?


Here is the code that I used to scrape the name and the url but every name starts with ~. I would like to delete the ~ part. I've tried using str_replace but it doesn't seem right from looking at it. (also tested it and same result)

foreach ($div_category as &$div){
    $a_list = $div->find("a");
    foreach ( $a_list as &$anchor){
        //put the data into an array and then write array out to a csv file.
        $csv_array=array($anchor->plaintext, $anchor->getAttribute("href") );
        $anchor = str_replace( '~', ' ', $anchor);
        fputcsv($csv_out, $csv_array);

current result example:

name      url
~john     www.john.com
~bob      www.bob.com
~rob      www.rob.com

Solution

  • <?php
    $str = "~~~~~~";
    $str = str_replace("~","!",$str);
    echo $str;
    ?>
    

    Works for me. So the replacing part should be fine. You must be addressing the attribute of your anchor 'incorrectly'. Try printing the anchor with:

    print_r($anchor) 
    

    to see what attribute you should be using

    EDIT:

    foreach ($div_category as &$div){
        $a_list = $div->find("a");
        foreach ( $a_list as &$anchor){
            //put the data into an array and then write array out to a csv file.
       ->   $csv_array=array($anchor->plaintext, $anchor->getAttribute("href") ); // line X
       ->   $anchor = str_replace( '~', ' ', $anchor);                            // line Y
            fputcsv($csv_out, $csv_array);
    

    The problem is the order of the X and Y lines marked with arrows. Switch them around and it should be working.

    EDIT2:

    and

    $anchor = str_replace( '~', ' ', $anchor);
    

    should be

    $anchor->plaintext = str_replace( '~', '', $anchor->plaintext);