Search code examples
pythonfileurllib2

Why does my code delete entire text file instead of line?


I'm checking a large list of URLs (one URL per line) for http codes. If one gives code 302 I want to delete that line from the file but everything I've tried just deletes the whole file. What am I doing wrong here?

Edit: Had wrong code pasted, Sorry! Also I have f.write(" ") as I was trying different methods of deleting the line, since everything I've tried just deletes all the whole file.

At first I was writing them to a new file, but It was taking too long(roughly 20k urls) so I figured deleting from the current file would be quicker. Or should I just stick with writing to a new file instead?

import urllib2, urllib

class NoRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
        infourl.status = code
        infourl.code = code
        return infourl
    http_error_300 = http_error_302
    http_error_301 = http_error_302
    http_error_303 = http_error_302
    http_error_307 = http_error_302

opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)
opener.addheaders.append(('Cookie', 'birthtime=568022401'))

with open('list.txt', 'w+') as f:
    sites = f.readlines()
    for url in sites:
        try:
            connection = urllib2.urlopen(url)
            position = f.tell()
            if connection.getcode() is 302:
               f.write(" ")
            print "pos:", position
            print connection.getcode()
            connection.close()
        except urllib2.HTTPError, e:
            print e.getcode()

Solution

  • There are few issues with your code

    1. Your file is closed as soon you leave the with section.
    2. You are opening the file just for reading
    3. It's bad practice to read the whole line into memory.

    You should:

    1. Open source file for read
    2. Open target file for write
    3. Iterate over source line by line and if OK write to target
    4. Close both files
    5. Delete source and rename target to the original source name

    Something like:

    with open('list.txt', 'r') as source, open('list-ok.txt', 'w') as target:
      for url in source:
        if do_something(url):
          target.write(url)
    # Rename here "list-ok.txt" to "list.txt"