Search code examples
phpparsingdomxpath

Get links between headings with DOMXPath


I have a html site with no ids or classes, just some links and headings like this

<h1>Link category 1</h1>
<a href="somesite">Somename 1</a>
<a href="somesite">Somename 2</a>
<a href="somesite">Somename 3</a>
<a href="somesite">Somename 4</a>
<h1>Link category 2</h1>
<a href="somesite">Somename 5</a>
<a href="somesite">Somename 6</a>
<a href="somesite">Somename 7</a>
<a href="somesite">Somename 8</a>

And so on

Currently I am parsing all the links on the page with this code

$dom = new DOMDocument();
@$dom->loadHTML($content);
$xPath = new DOMXPath($dom);
$elements = $xPath->query("//a");

With that I can get the text from all of the links but what I want to do is divide it up so that I first get all the links after the first h1 and do something with that and then I get all the links after the second h1 and do something with that. There can be any number of links and any number of headings.

Anyone have any tips or possibly an example of how to go about doing this?


Solution

  • in my case I always know what the headings will be and it is a very small chance they will be the same as the headings text so I was able to use this

    $xPath->query("//a | //h1");
    

    to get all the and elements and then using an if statement to change the mysql insert query when a new heading was detected.