jquery python screen-scraping lxml pyquery

Straight LXML or PyQuery

Does anyone have experience scraping with straight lxml vs. PyQuery. I just came across the latter recently and was intrigued. I haven't been able to find many comments about the library just yet, so I'm curious as to how robust it is.

I'm familiar with lxml and generally enjoy it. It would be nice, however, to use jQuery selector syntax.

Is the switch worth it?

Thanks!

Solution

Only you can answer the question of whether it's worth it.

It simply depends on whether you want to use an extra dependency in order to get jQuery's custom CSS selectors.

Here are the things jQuery adds on top of the standard CSS selectors: http://api.jquery.com/category/selectors/jquery-selector-extensions/

And here is the translation of those selectors to normal CSS selectors in PyQuery: https://bitbucket.org/olauzanne/pyquery/src/c2bf08a8f4e7/pyquery/cssselectpatch.py

I don't see why it should be any less robust than using plain CSS selectors with lxml. It's simply translating special jQuery selectors into CSS selectors.