I have a piece of text similar to this and it is basically a string of HTML code.
hello
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>
<div>....</div>
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>
<div>....</div>
<div>
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>
</div>
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>
What I would like is to capture all of the span tags innerText (so in the example below, it would be Professional Referee) and store the results in an array.
The Regex - I am thinking this would be the way to go - I have is like this:
^/(<span)([\a-zA-Z0-9\s]*)(<\/span>)/$
I am not flash on regex, and the additional issues is that each span tag may have some attributes that are not equal to the other tags.
I think if I can get the full span tags from here in an array then I can manage to remove the left over stuff.
I got a regex101 link here: https://regex101.com/r/9K90pa/1
Can someone help me select on the right way?
Regex is not the ideal tool for analysing HTML. The DOM API offers a DOM Parser:
const html = `hello
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>
<div>....</div>
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>
<div>....</div>
<div>
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>
</div>
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>
<span dir="auto" class="aDTYNe snByac OvPDhc OIC90c">Professional Referee</span>`;
const doc = new DOMParser().parseFromString(html, "text/html");
const spanTexts = Array.from(doc.querySelectorAll("span"), span => span.textContent);
console.log(spanTexts);