Search code examples
pythonreplaceruntime-error

How to make new line commands work in a .txt file opened from the internet?


I just started using Python, I am trying to make a program that writes the lyrics of a song on the screen opened from the internet "www....../lyrics.txt". My first code:

    import urllib.request
    lyrics=urllib.request.urlopen("http://hereIsMyUrl/lyrics.txt")
    text=lyrics.read()
    print(text)

When I activated this code, it didn't give me the lyrics as they are written on the website, it gave me new line commands '\r\n' at all the places that should have been new lines and gave me all the lyrics in a long messy string. For example: Some lyrics here\r\nthis should already be the next line\r\nand so on.

I searched the internet for codes to replace the '\r\n' commands with new lines and tried the following:

    import urllib.request
    lyrics=urllib.request.urlopen("http://hereIsMyUrl/lyrics.txt")
    text=lyrics.read()
    text=text.replace("\r\n","\n")
    print(text)

I hoped it would atleast replace something, but instead it gave me a runtime-error:

    TypeError: expected bytes, bytearray or buffer compatible object

I searched the internet about that error, but I didn't find anything connected to opening files from the internet.

I have been stuck at this point for hours and have no idea how to continue. Please help! Thanks in advance!


Solution

  • Your example is not working because the data returned by the read statement is a "bytes object". You need to decode it using an appropriate encoding. See also the docs for request.urlopen, file.read and byte array operations.

    A complete working example is given below:

    #!/usr/bin/env python3
    
    import urllib.request
    
    # Example URL
    url = "http://ntl.matrix.com.br/pfilho/oldies_list/top/lyrics/black_or_white.txt"
    
    # Open URL: returns file-like object
    lyrics = urllib.request.urlopen(url)
    
    # Read raw data, this will return a "bytes object"
    text = lyrics.read()
    
    # Print raw data
    print(text)
    
    # Print decoded data:
    print(text.decode('utf-8'))
    
    # If you still need newline conversion, you could use the following
    text = text.decode('utf-8')
    text = text.replace('\r\n', '\n')
    print(text)