What would be the simplest way to get the title of a page in Requests?
r = requests.get('http://www.imdb.com/title/tt0108778/')
# ? r.title
Friends (TV Series 1994–2004) - IMDb
You need an HTML parser to parse the HTML response and get the title
tag's text:
Example using lxml.html
:
>>> import requests
>>> from lxml.html import fromstring
>>> r = requests.get('http://www.imdb.com/title/tt0108778/')
>>> tree = fromstring(r.content)
>>> tree.findtext('.//title')
u'Friends (TV Series 1994\u20132004) - IMDb'
There are certainly other options, like, for example, mechanize
library:
>>> import mechanize
>>> br = mechanize.Browser()
>>> br.open('http://www.imdb.com/title/tt0108778/')
>>> br.title()
'Friends (TV Series 1994\xe2\x80\x932004) - IMDb'
What option to choose depends on what are you going to do next: parse the page to get more data, or, may be, you want to interact with it: click buttons, submit forms, follow links etc.
Besides, you may want to use an API provided by IMDB
, instead of going down to HTML parsing, see:
Example usage of an IMDbPY
package:
>>> from imdb import IMDb
>>> ia = IMDb()
>>> movie = ia.get_movie('0108778')
>>> movie['title']
u'Friends'
>>> movie['series years']
u'1994-2004'