I am using puppeteer to load a website and then store the HTML of that site using:
html = await page.evaluate('new XMLSerializer().serializeToString(document.doctype) + document.documentElement.outerHTML');
This works fine and returns the html as it is supposed to do (can't use requests on this site long story short).
What I now need to do is in the HTML there is a chunk that looks like so:
<ul class="styled-radio">
<li>
<input type="radio" name="variant_id" id="variant_id_118018" value="118018">
<label for="variant_id_118018">5</label>
</li>
<li>
<input type="radio" name="variant_id" id="variant_id_118019" value="118019">
<label for="variant_id_118019">6</label>
</li>
<li>
<input type="radio" name="variant_id" id="variant_id_118020" value="118020">
<label for="variant_id_118020">6,5</label>
</li>
... keeps going ...
</ul>
For each variant_id_xxxxxx I need to get the xxxxxx number value and also the label inner text and then store it as xxxxxx:innerTextHere
For example for the first one in that block of text above it would be 118018:5
If we could then store all the xxxxxx:innerTextHere values in the array sizes that would also be great so the final result for the html above would be [118018:5, 118019:6, 118020:6,5]
Thanks in advance :)
you can use node package Cherrio to achieve above result. Please refer the sample code.
const cheerio = require('cheerio')
const data = `
<ul class="styled-radio">
<li>
<input type="radio" name="variant_id" id="variant_id_118018" value="118018">
<label for="variant_id_118018">5</label>
</li>
<li>
<input type="radio" name="variant_id" id="variant_id_118019" value="118019">
<label for="variant_id_118019">6</label>
</li>
<li>
<input type="radio" name="variant_id" id="variant_id_118020" value="118020">
<label for="variant_id_118020">6,5</label>
</li>
... keeps going ...
</ul>`;
const result = [];
const $ = cheerio.load(data);
const variants = $("input[name='variant_id']");
variants.each((index, { attribs }) => {
const { id, value } = attribs;
const label = $("label[for='" + id + "']");
result.push({
id,
value,
label: label.text()
})
})
console.log(result);