Search code examples
phphtmldoc

how to find keyword in URL from html source, but store link and anchor text in array


bit stuck on this, what I'm looking to do is loop a list of URL'S which contain links back to my site, I'm looking to capture the HTML code used to produce the link and alternatively store the anchor text which is used as the link,

[code removed by marty see below]

so the code used for martylinks uses a function im still trying to buid, this is were im having a little trouble, but for you guys im sure its really simple..

this is my find_marty_links function

function find_marty_links($file, $keyword){
    //1: Find link to my site <a href="http://www.***martin***-gardner.co.uk" target="_blank" title="Web Developer">Web Developer</a>
    //2: copy the FULL HTML LINK to array
    //3: copy the REL value? NOFOLLOW : FOLLOW to array
    //4  copy TITLE (if any) to array
    //5  copy Anchor Text to array

    $htmlDoc = new DomDocument();
    $htmlDoc->loadhtml($file);

    $output_array = array();
    foreach($htmlDoc->getElementsByTagName('a') as $link) {

            // STEP 1
        // SEARCH ENTIRE PAGE FOR KEYWORD?
            // FIND A LINK WITH MY KEYWORD?
            preg_match_all('???', $link, $output); //???//

            if(strpos($output) == $keyword){


               // STEP 2
               // COPY THE FULL HTML FOR THAT LINK?
               $full_html_link = preg_match(??);
               $output_array['link_html'] = $full_html_link;

               // STEP 3
               // COPY THE REL VALUE TO ARRAY
               $link_rel = $link->getAttribute('rel');
               $output_array['link_rel'] = $link_rel;

               // STEP 4
               // COPY TITLE TO ARRAY
               $link_title = $link->getAttribute('title');
               $output_array['link_title'] = $link_title;

               // STEP 5
               // COPY ANCHOR TEXT TO ARRAY
               $anchor_exp = expode('>'); //???
               $anchor_txt = $anchor_exp[2];//??
               $output_array['link_anchor'] = $anchor_txt;

            }

    }
}

!!UPDATE!! need to produce an Array like below

$results = array('link_html' => '<a title="test" href="http://site.com" rel="nofollow">anchor text</a>',
                 'link_rel' => 'nofollow',
                 'link_title' => 'test',
                 'link_anchor' => 'anchor text'
                 )

thanks for any help lads..

M


Solution

  • Ok here is the updated code:

    function find_marty_links($file, $keyword){
        $htmlDoc = new DomDocument();
        $htmlDoc->loadhtml($file);
        $links = array();
    
        foreach($htmlDoc->getElementsByTagName('a') as $link) {
            $url = $link->getAttribute('href');
            $title = $link->getAttribute('title');
            $text = $link->nodeValue;
            $rel = $link->getAttribute('rel');
    
            if(strpos($url,$keyword) !== false || strpos($title,$keyword) !== false || strpos($text,$keyword) !== false)
            {
                $links[] = array('url' => $url, 'text' => $text, 'title' => $title, 'rel' => $rel);
            }
        }
    
        return $links;
    }