Search code examples
pythonurllib2urllib

python: urllib object cannot be accessed after the first for loop


The following codes does not work well since 'line1' does not exist. However, 'line' does exist. It seems that 'fhand' changed after the first for loop. If we comment out the first for loop, then the codes work very well.

Could anyone explain why this happens?

import urllib
fhand = urllib.urlopen('http://www.py4inf.com/code/romeo.txt')

# It is the following 2 lines that cause error
for line in fhand:
    print line.strip()

counts = dict()
for line1 in fhand:
    words = line1.split()

    for word in words:
        counts[word] = counts.get(word, 0) + 1

print counts

Solution

  • urllib.urlopen returns a generetor, which is exhausted by the first loop.

    Either convert fhand to a list
    (fhand = list(urllib.urlopen('http://www.py4inf.com/code/romeo.txt'))), or do everything inside the first loop (ie have only a single loop).