Search code examples
pythonjquerycss-selectorshtml-parsing

jquery-like HTML parsing in Python?


Is there any way in Python that would allow me to parse an HTML document similar to what jQuery does?

i.e. I'd like to be able to use CSS selectors syntax to grab an arbitrary set of nodes from the document, read their content/attributes, etc.


Solution

  • If you are fluent with BeautifulSoup, you could just add soupselect to your libs.
    Soupselect is a CSS selector extension for BeautifulSoup.

    Usage:

    from bs4 import BeautifulSoup as Soup
    from soupselect import select
    import urllib
    soup = Soup(urllib.urlopen('http://slashdot.org/'))
    select(soup, 'div.title h3')
    
        [<h3><span><a href='//science.slashdot.org/'>Science</a>:</span></h3>,
         <h3><a href='//slashdot.org/articles/07/02/28/0120220.shtml'>Star Trek</h3>,
        ..]