Search code examples
pythonjsonurllib

How to read from a url directly as a list verbatim?


I have a url that contains just a list. For example, the path https://somepath.com/dev/doc/72 returns simply (no html code):

[
  "A/RES/72/1", 
  "A/RES/72/2", 
  "A/RES/72/3", 
  "A/RES/72/4"
]

I want to take the entire contents (including the square brackets) and make this into a list. Doing it by hand, I can copy/paste as a list like this:

docs = [
  "A/RES/72/1", 
  "A/RES/72/2", 
  "A/RES/72/3", 
  "A/RES/72/4"
]
print(docs)
['A/RES/72/1', 'A/RES/72/2', 'A/RES/72/3', 'A/RES/72/4']

I would like to pass the content of the url to the list.

I tried the following

link = "https://somepath.com/dev/doc/72"
f = urlopen(link)
myfile = f.read()
print(myfile)
b'[\n  "A/RES/72/1", \n  "A/RES/72/2", \n  "A/RES/72/3", \n  "A/RES/72/4"\n]\n

It's a mess with new lines and not a list.

I'm guessing I would have to parse each line, removing the \n character, or something like this: file.read().splitlines() , but that seems overly complicated for such a simple input.

I've seen many solutions that parse .csv files, read inputs from each line, etc. But nothing to deal with a list that is already made and just needs to be called. Thanks for any help and pointers.

edit: I tried this:

import urllib.request  # the lib that handles the url stuff
link = "https://somepath.com/dev/doc/72"
a=[]
for line in urllib.request.urlopen(link):
    print(line.decode('utf-8'))
    a.append(line)

a

The print command gives me something close to what I want. But the append command gives me a mess again:

[b'[\n',
 b'  "A/RES/72/1", \n',
 b'  "A/RES/72/2", \n',
 b'  "A/RES/72/3", \n',
 b'  "A/RES/72/4"\n',
 b']\n']

Edit: Turns out the url is serving a JSON. The solution by fuglede below (https://stackoverflow.com/a/60119016/10764078):

import requests
docs = requests.get('https://somepath.com/dev/doc/72').json()

I'm going to do some reading on JSON.


Solution

  • Assuming what the site is sending you is JSON, with requests, this would be obtainable through

    import requests
    docs = requests.get('https://somepath.com/dev/doc/72').json()