Search code examples
phppythonwordpressurllib2urlopen

How to By pass WP super cache using python?


I'm trying to collecting data from a frequently updating blog, so I simply use a while loop which includes urllib2.urlopen("http:\example.com") to refresh the page every 5 minutes to collect the data I wanted.

But I notice that I'm not getting the most recent content by doing this, it's different from what I see via browser such as Firefox, and after checking both the source code of Firefox and the same page I get from python, I found that it's WP Super Cache which is preventing me from getting the most recent result.

And I still get the same cache page even if I spoof the headers in my python code. So I wonder is there a way to by pass WP super cache? And why there's no such super cache in Firefox at all?


Solution

  • Have you tried changing the URL with some harmless data? Something like this:

    import time
    urllib2.urlopen("http:\example.com?time=%s" % int(time.time()))
    

    It will actually call http:\example.com?time=1283872559. Most caching systems will bypass the cache if there's a querystring or it's something that isn't expected.