Search code examples
javascriptphphtmlregexdomdocument

JavaScript equivalent of php DOMDocument Object


I wrote a code in PHP for parsing data that I received by an API request from "wikipedia.org". I used DOMDocument class to parse the data and it worked perfectly fine. Now I want to do the same job in JavaScript. The API request returns (after a little cleaning up) a string like this:

$htmlString = "<ul>
    <li>Item 1</li>
    <li>Item 2</li>
</ul>
<ul>
    <li>Item 3</li>
    <li>Item 4</li>
    <li>Item 5</li>
</ul>"

Note that this is just an example. Any request might have different number of lists, but it is always a series of unordered lists. I needed to get the text inside the <li> tags and the following PHP code worked perfectly fine.

$DOM = new DOMDocument;
$DOM->loadHTML($htmlString);
$lis = $DOM->getElementsByTagName('li');
$items =[];
for ($i = 0; $i < $lis->length; $i++) $items[] = $lis[$i]->nodeValue;

And I get the array [Item 1,...,Item 5] inside $items variable as I wanted. Now I want to do the same job in JavaScript. That is I have a string

htmlString = "<ul>
    <li>Item 1</li>
    <li>Item 2</li>
</ul>
<ul>
    <li>Item 3</li>
    <li>Item 4</li>
    <li>Item 5</li>
</ul>"

in JavaScript and I want to get the text inside each of the <li> tags. I searched the web for an equivalent class to PHP DOMDocument in JavaScript, and surprisingly I found nothing. Any ideas how to do this in (preferably Vanilla) JavaScript similar to the PHP code? If not, any idea how to do this anyway in JavaScript (even maybe with regular expressions)?


Solution

  • Use DOMParser()

    Your ported code, which is very much the same as your PHP:

    let parser = new DOMParser()
    let doc = parser.parseFromString(`<ul>
        <li>Item 1</li>
        <li>Item 2</li>
    </ul>
    <ul>
        <li>Item 3</li>
        <li>Item 4</li>
        <li>Item 5</li>
    </ul>`, "text/html")
    
    
    let lis = doc.getElementsByTagName('li')
    let items = []
    for (let i = 0; i < lis.length; i++) items.push(lis[i].textContent)
    
    console.log(items)