Search code examples
pythonurllib2urllib

Converting from Python 2 to Python 3: TypeError: a bytes-like object is required


I was given the following Python 2x code. I went to convert it to Python 3x by changing import urllib2 to from urllib.request import urlopen. I got rid of the urllib2 reference and ran the program. The document at the end of the url was retrieved, but the program failed at the line indicated, throwing the error

TypeError: a bytes-like object is required, not 'str'

The document looks like this: b'9306112 9210128 9202065 \r\n9306114 9204065 9301122 \r\n9306115 \r\n9306116 \r\n9306117 \r\n9306118 \r\n9306119

I tried playing with the return value at that line and the one above (e.g., converting to bytes, splitting on different values), but nothing worked. Any thoughts as to what is happening?

import urllib2


CITATION_URL = "http://storage.googleapis.com/codeskulptor-alg/alg_phys-cite.txt"

def load_graph(graph_url):
    """
    Function that loads a graph given the URL
    for a text representation of the graph

    Returns a dictionary that models a graph
    """
    graph_file = urllib2.urlopen(graph_url)
    graph_text = graph_file.read()
    graph_lines = graph_text.split('\n') <--- The Problem
    graph_lines = graph_lines[ : -1]

    print "Loaded graph with", len(graph_lines), "nodes"

    answer_graph = {}
    for line in graph_lines:
        neighbors = line.split(' ')
        node = int(neighbors[0])
        answer_graph[node] = set([])
        for neighbor in neighbors[1 : -1]:
            answer_graph[node].add(int(neighbor))

    return answer_graph

citation_graph = load_graph(CITATION_URL)
print(citation_graph)

Solution

  • In order to treat a bytes object like a string, you need to decode it first. For example:

    graph_text = graph_file.read().decode("utf-8")
    

    if the encoding is UTF-8. This should allow you to treat this as a string instead of a sequence of bytes.