Search code examples
pythonpython-3.xweb-scrapingbeautifulsoupkeyerror

Beautifulsoup filter results raise KeyError in "for i" loop


I am trying to minimize my codes, making it more efficient. However, I got hit by this KeyError truck, which I can't figure out what went wrong. Please Help me out Chiefs, and point me why my expression is not OK? PS I am amateur level.

With these codes:

recommended = soup.select('table:has(font:contains("推荐主题")), '
                          'table:has(font:contains("版块主题"))')
for item in recommended:
    for i in item.select(".folder:has(a)"):

I will have DOM of:

<td class="folder"><a href="thread-10439294-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439293-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439292-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439290-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>

But when I add one more line,

for item in recommended:
    for i in item.select(".folder:has(a)"):
        url_tail = i['href']

I will get this KeyError of:

    return self.attrs[key]
KeyError: 'href'

What I am trying to get out of it are the href links, Thank you all.


Solution

  • @facelessuser has explained nicely the error (+) and given my first choice selector. It looks like there may be two other attribute = value selector possibilities as plan Bs

    Either:

    [href^="thread-"]
    

    Or:

    [title="新窗口打开"]
    

    Which can be used in a list comprehension such as

    links =  [item['href'] for item in soup.select('[href^='thread-']')]
    

    Your select may be off item rather than soup. You can always throw in the parent class if that ends up too broad a match .folder [title="新窗口打开"]