Can somebody recommend a Node.Js module or a Javascript library (not based on Readability), which can be used to extract content from web pages and RSS feeds?
I found a good PHP library that can do the job - http://fivefilters.org/content-only/ - but looking for a Node.Js module that would do the same.
Thank you!
I wrote a Node.js module just for this purpose called 'unfluff':
https://github.com/ageitgey/node-unfluff
Hopefully that will solve your problem.
Unfluff is based on the popular "python-goose" and "goose" (Scala) page extraction libraries in case you are familiar with those.