Search code examples
javascriptregexbrowserframeworksbrowser-extension

Do you know an open source Javascript extraction/regexp engine?


We are in need of a DOM parser, that will be able to run a bunch of patterns and would store the results. For this we are looking for libraries that are open and we can start on,

  • able to select elements by regexp (for example grab all elements that contain "price" either in class, id, other attributes like meta attributes),
  • should have a lot of helpers like: remove comments, iframes, etc
  • and be pretty fast.
  • can be run from browser extensions.

Solution

  • node-htmlparser can parse HTML, provides a DOM with a number of utils (also supports filtering by functions) and can be run in any context (even in WebWorkers).

    I forked it a while back, improved it for better speed and got some insane results (read: even faster than native libexpat bindings).

    Nevertheless, I would advice you to use the original version, as it supports browsers out-of-the-box (my fork can be run in browsers using browserify, which adds some overhead).