The wiki pages that I am trying to parse include the following html:
<div
class="pi-smart-data-value pi-data-value pi-font pi-item-spacing pi-border-color"
style="width: calc(1 / 1 * 100%)"
data-source="unique_ability"
>
<b>Captain</b><br /><b>"Back off!"</b><br />Push target on hit.
</div>
What I would like to parse the content of the div into is an array like this:
["Captain", "Back off!", "Push target on hit."]
If I use the text() method from cheerio (const uniqueAbilities = $('[data-source="unique_ability"]').text()
) I get a long string like this: Captain"Back off!"Push target on hit.
If I use the html() method (const uniqueAbilities = $('[data-source="unique_ability"]').html();
) from cheerio I get the HTML content of the node, but I am then unable to parse it as a string.
How would you parse this html into the desired output?
Thanks for the help.
Here is a way to obtain your desired output but it might not cover all the other cases :
//get HTML content
let data = $('[data-source="unique_ability"]').html();
console.log(data);
//remove all carriage return
data = data.replaceAll(/[\n\r]+/g, '');
console.log(data);
//split the string around HTML tags
data = data.split(/<[^>]+>/g);
console.log(data);
//remove empty strings from the array
data = data.filter(el => el.trim() != "");
console.log(data);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div
class="pi-smart-data-value pi-data-value pi-font pi-item-spacing pi-border-color"
style="width: calc(1 / 1 * 100%)"
data-source="unique_ability"
>
<b>Captain</b><br /><b>"Back off!"</b><br />Push target on hit.
</div>