I'm trying to extract data from anchor urls of a webpage i.e. :
require 'simple_html_dom.php';
$html = file_get_html('http://www.example.com');
foreach($html->find('a') as $element)
{
$href = $element->href;
$name = $surname = $id = 0;
parse_str($href);
echo $name;
}
Now, the problem with this is that it doesn't work for some reason. All urls are in the following form:
name=James&surname=Smith&id=2311245
Now, the strange thing is, if I execute
echo $href;
I get the desired output. However, that string won't parse for some reason and also has a length of 43 according to strlen()
function. If, however, I pass 'name=James&surname=Smith&id=2311245'
as the parse_str()
function argument, it works just fine. What could be the problem?
I'm gonna take a guess that your target page is actually one of the rare pages that correctly encodes &
in its links. Example:
<a href="somepage.php?name=James&surname=Smith&id=3211245">
To parse this string, you first need to unescape the &
s. You can do this with a simple str_replace
if you like.