Search code examples
pythonscreen-scrapingpyquery

Getting attributes in PyQuery?


I'm using PyQuery and want to print a list of links, but can't figure out how to get the href attribute from each link in the PyQuery syntax.

This is my code:

  e = pq(url=results_url)
  links = e('li.moredetails a')
  print len(links)
  for link in links:
    print link.attr('href')

This prints 10, then gives the following error:

AttributeError: 'HtmlElement' object has no attribute 'attr'

What am I doing wrong?


Solution

  • PyQuery wraps lxml, so you use the ElementTree API to access attributes:

    e = pq(url=results_url)
    for link in e('li.moredetails a'):
        print link.attrib['href']
    

    Alternatively, to use the PyQuery API on any found element, wrap the element in a pq() call, echoing the way you need to use jQuery $() or jQuery() to wrap DOM elements:

        print pq(link).attr('href')
    

    or

        print pq(link).attr['href']
    

    for a more pythonic way to accessess the attributes.

    You could also loop over the .items() method instead, which returns PyQuery elements instead:

    e = pq(url=results_url)
    for link in e('li.moredetails a').items():
        print link.attr['href']