I'm trying to get search results from Baidu. But now I'm stucked here:
import sys
import urllib
import urllib2
from bs4 import BeautifulSoup
question_word = "Hello"
url = "http://www.baidu.com/s?wd=" + urllib.quote(question_word.decode(sys.stdin.encoding).encode('gbk'))
htmlpage = urllib2.urlopen(url).read()
soup = BeautifulSoup(htmlpage)
for child in soup.findAll("h3", {"class": "t"}):
print child.contents[0]
This will return all the tags that has the target urls. I do not know how to use .get('href') to list out the actual urls
I'm new to Python thus might have some confusions over the basic concepts.. I'll really appreciate the help.
for child in soup.findAll("h3", {"class": "t"}):
print child.a.get('href')
use .
to get the next a
tag in the h3
tag, then you can use .get()