I am creating a 'generic' web scraper the would scrape any page having a list of entries. I would like to drive from the config the tags that it should extract.
Example with the following config:
{
name : "price",
valueJQueryExpression : ".mt9 > .mt7.b"
},
... I'm parsing the following way:
const $ = require('cheerio');
let jquery = getQuery("price");
let keys = $(jquery);
However, I have more tricky parsers to handle, eg. that one:
let location = $('.mt9 > .b', html).not('.mt5').not('.mt7').text().trim()
In such case I thought using an eval()
and pass the full expression in the config. However this is not recommended due to safety issues.
Would you have any recommendation on handling this differently?
You should be able to use the :not
pseudo class here. Try the following:
$('.mt9 > .b:not(.mt5):not(.mt7)', html).text().trim()
It is similar to jQuery, where the selector specified inside :not()
will be used to exclude elements from the matches.
You can see it in action below:
.mt9 > .b:not(.mt5):not(.mt7) {
color: red;
}
<div class="mt9">
<div class="b">This should be red</div>
<div class="b mt7">This should not be red</div>
<div class="b mt5">This should not be red</div>
</div>