Search code examples
pythonwindowsdecodeurlopen

Python 3.6 openurl behaviour is different in windows OS


I'm a beginner in python language

i'm trying to use urlopen by with statement to get file content and print all the words of this file.

this is the example which i follow:

from urllib.request import urlopen
with urlopen('http://sixty-north.com/c/t.txt') as story:
    story_words = []
    for line in story:
        line_words = line.decode('utf-8').split()
        for word in line_words:
            story_words.append(word)

print(story_words)

The code is working fine in the pyfiddle website, but not work in my local machine.

The result should be like this:

['It', 'was', 'the', 'best', 'of', 'times', 'it', 'was', 'the', 'worst', 'of', 'times', 'it', 'was', 'the', 'age', 'of', 'wisdom', 'it', 'was', 'the', 'age', 'of', 'foolishness', 'it', 'was', 'the', 'epoch', 'of', 'belief', 'it', 'was', 'the', 'epoch', 'of', 'incredulity', 'it', 'was', 'the', 'season', 'of', 'Light', 'it', 'was', 'the', 'season', 'of', 'Darkness', 'it', 'was', 'the', 'spring', 'of', 'hope', 'it', 'was', 'the', 'winter', 'of', 'despair', 'we', 'had', 'everything', 'before', 'us', 'we', 'had', 'nothing', 'before', 'us', 'we', 'were', 'all', 'going', 'direct', 'to', 'Heaven', 'we', 'were', 'all', 'going', 'direct', 'the', 'other', 'way', 'in', 'short', 'the', 'period', 'was', 'so', 'far', 'like', 'the', 'present', 'period', 'that', 'some', 'of', 'its', 'noisiest', 'authorities', 'insisted', 'on', 'its', 'being', 'received', 'for', 'good', 'or', 'for', 'evil', 'in', 'the', 'superlative', 'degree', 'of', 'comparison', 'only']

i'm using windows 10 with python Version 3.6

this is the error that shown to me when i try to print the words locally:

Traceback (most recent call last): File "", line 4, in line_words = line.decode('utf-8').split() UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

I printed also the words without decoding to see how it come in both (pyfiddle and local):

In pyfiddle:

[b'It', b'was', b'the', b'best', b'of', b'times', b'it', b'was', b'the', b'worst', b'of', b'times', b'it', b'was', b'the', b'age', b'of', b'wisdom', b'it', b'was', b'the', b'age', b'of', b'foolishness', b'it', b'was', b'the', b'epoch', b'of', b'belief', b'it', b'was', b'the', b'epoch', b'of', b'incredulity', b'it', b'was', b'the', b'season', b'of', b'Light', b'it', b'was', b'the', b'season', b'of', b'Darkness', b'it', b'was', b'the', b'spring', b'of', b'hope', b'it', b'was', b'the', b'winter', b'of', b'despair', b'we', b'had', b'everything', b'before', b'us', b'we', b'had', b'nothing', b'before', b'us', b'we', b'were', b'all', b'going', b'direct', b'to', b'Heaven', b'we', b'were', b'all', b'going', b'direct', b'the', b'other', b'way', b'in', b'short', b'the', b'period', b'was', b'so', b'far', b'like', b'the', b'present', b'period', b'that', b'some', b'of', b'its', b'noisiest', b'authorities', b'insisted', b'on', b'its', b'being', b'received', b'for', b'good', b'or', b'for', b'evil', b'in', b'the', b'superlative', b'degree', b'of', b'comparison', b'only']

Local:

[b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03}\x91Qn\x840', b'D\xff\xf7\x14>L?Z\xa9\x97\x08d', b"\xd6\x86\x18\xd9\x06\xb4\xb7\xaf\xb3\xabU\xbb\x15\xed\x17\xc1o\xec\x8c'\x1fNG2\xf2\x02\x1aN2\x91\xf3\x02\xbb\xf078D\xff", b'iF\xaf\x1flY\x96\x130\x89T\xb6\xd2/mXe,\x9d\x0f\xa8\x8c\xe9\x14q\x1b\x15y\xab\xec\xb7\x9f\xdc\x90LZ\x17|\xf2\\xfc\x1c\xbd%\xbd\xfe\xbe\xd3V\xe56wZd\xc5\xcbz\xdc\x1c\xdaI\x86\xad\x89\xf5r\x80J\xca\x84\x1dz\xf3\xd2\xdb\x06L\xa2\xa0\xcd\x9e\xac\xc9', b'8\x10\xc7T+\xcd\xd2Yf\xc5\xe8\xe4B\xefH;\xda?\x92\xb0\x11\x03\xc3\xc5\x91b\xddFVD\x1f\xe5\x15\xca\x92\xeffMhJJ\x95\xafx', b"\x85\xa1\xf9S\xe2%yh\x96\x9e|\xecg\xe1\x91\x8d\xfb\xa3\xa6\xcdc\x1e{\xfcD\xae\xc6\xe6\xc8\x14Qu\xd1\x80\xee#\\x80\xf7\xa8\xc66a-\xa6\xc57\xce\x17\xec\\xa3\xe7\x11\xe1\x167\xd5\xe4!\x8c\xa8f\xc5\xfd\x8dGY\xd6\xa4|\x8f\xbe\xd5\xdb\x17\xc5v'\x1dQ\x02\x00\x00"]

Can anyone tell what is happened here, and how can i solve this?


Solution

  • The code is working now.

    This issue happened because of the network it self, as after changing the network it worked fine, so maybe it happened because of some firewall configurations of the old network.

    Also as Thierry Lathuille suggest, without changing the network we can use the Requests library and it worked perfectly.

    Thanks.