I'm trying to get the redirects of some Wikipedia pages, but what I get from requests .history
is a bit strange to me.
If I do:
>>> request = requests.get("https://en.wikipedia.org/wiki/barcelona", allow_redirects=True)
>>> request.url
u'https://en.wikipedia.org/wiki/Barcelona'
>>> request.history
[<Response [301]>]
As you can see, the redirection is correct and I get the same URL in .history
as what I get when accessing the page from the browser.
But if I do:
>>> request = requests.get("https://en.wikipedia.org/wiki/Yardymli_Rayon", allow_redirects=True)
>>> request.url
u'https://en.wikipedia.org/wiki/Yardymli_Rayon'
>>> request.history
[]
The .history
is empty but in the browser I see that the URL has actually changed to: https://en.wikipedia.org/wiki/Yardymli_District
Anyone knows how to solve it?
Requests doesn't show the redirect because you're not actually being redirected in the HTTP sense. Wikipedia does some JavaScript trickery (probably HTML5 history modification and pushState) to change the address that's shown in the address bar, but that doesn't apply to Requests, of course.
In other words, both requests
and your browser are correct: requests
is showing the URL you actually requested (and Wikipedia actually served), while your browser's address bar is showing the 'proper', canonical URL.
You could parse the response and look for the <link rel="canonical">
tag if you want to find out the 'proper' URL from your script, or fetch articles over Wikipedia's API instead.