Search code examples
pythonpython-2.7lxmlcanonical-quickly

Searching on class names with a dash ('-')


I'm messing around with lxml in Python, but can't seem to figure out how to use the cssselect() function to get all div's with the class reddit-entry, as it seems to dislike the - character. Any other class name without - works fine.


Solution

  • That’s a bug in the parser in lxml.cssselect. I took over maintenance of the project and extracted it from lxml. The bug is fixed in the new cssselect: http://packages.python.org/cssselect/

    lxml 2.4 will use the new cssselect, but until then the way to use it is:

    from cssselect import HTMLTranslator
    result = lxml_document.xpath(HTMLTranslator().css_to_xpath('div.reddit-entry'))