I am trying to minimize my codes, making it more efficient. However, I got hit by this KeyError truck, which I can't figure out what went wrong. Please Help me out Chiefs, and point me why my expression is not OK? PS I am amateur level.
With these codes:
recommended = soup.select('table:has(font:contains("推荐主题")), '
'table:has(font:contains("版块主题"))')
for item in recommended:
for i in item.select(".folder:has(a)"):
I will have DOM of:
<td class="folder"><a href="thread-10439294-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439293-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439292-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
<td class="folder"><a href="thread-10439290-1-1.html" target="_blank" title="新窗口打开"><img src="images/green001/folder_new.gif"/></a></td>
But when I add one more line,
for item in recommended:
for i in item.select(".folder:has(a)"):
url_tail = i['href']
I will get this KeyError of:
return self.attrs[key]
KeyError: 'href'
What I am trying to get out of it are the href links, Thank you all.
@facelessuser has explained nicely the error (+) and given my first choice selector. It looks like there may be two other attribute = value selector possibilities as plan Bs
Either:
[href^="thread-"]
Or:
[title="新窗口打开"]
Which can be used in a list comprehension such as
links = [item['href'] for item in soup.select('[href^='thread-']')]
Your select
may be off item
rather than soup
. You can always throw in the parent class if that ends up too broad a match .folder [title="新窗口打开"]