We are in need of a DOM parser, that will be able to run a bunch of patterns and would store the results. For this we are looking for libraries that are open and we can start on,
node-htmlparser can parse HTML, provides a DOM with a number of utils (also supports filtering by functions) and can be run in any context (even in WebWorkers).
I forked it a while back, improved it for better speed and got some insane results (read: even faster than native libexpat bindings).
Nevertheless, I would advice you to use the original version, as it supports browsers out-of-the-box (my fork can be run in browsers using browserify, which adds some overhead).