I'm checking a large list of URLs (one URL per line) for http codes. If one gives code 302 I want to delete that line from the file but everything I've tried just deletes the whole file. What am I doing wrong here?
Edit: Had wrong code pasted, Sorry! Also I have f.write(" ") as I was trying different methods of deleting the line, since everything I've tried just deletes all the whole file.
At first I was writing them to a new file, but It was taking too long(roughly 20k urls) so I figured deleting from the current file would be quicker. Or should I just stick with writing to a new file instead?
import urllib2, urllib
class NoRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
infourl = urllib.addinfourl(fp, headers, req.get_full_url())
infourl.status = code
infourl.code = code
return infourl
http_error_300 = http_error_302
http_error_301 = http_error_302
http_error_303 = http_error_302
http_error_307 = http_error_302
opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)
opener.addheaders.append(('Cookie', 'birthtime=568022401'))
with open('list.txt', 'w+') as f:
sites = f.readlines()
for url in sites:
try:
connection = urllib2.urlopen(url)
position = f.tell()
if connection.getcode() is 302:
f.write(" ")
print "pos:", position
print connection.getcode()
connection.close()
except urllib2.HTTPError, e:
print e.getcode()
There are few issues with your code
with
section.You should:
Something like:
with open('list.txt', 'r') as source, open('list-ok.txt', 'w') as target:
for url in source:
if do_something(url):
target.write(url)
# Rename here "list-ok.txt" to "list.txt"