From http://svo2.cab.inta-csic.es/theory/fps/ you can get the transmission curves for many filters used in astronomical observations. I would like to get these data by opening the url with the corresponding xml file (for each filter), parse it to astropy's votable that helps to read the table data easily.
I have managed to do this by opening the file converting it to a UTF-8 file and saving in locally as an xml. Then opening the local file works fine, as it is obvious form the following example.
However I do not want to save the file and open it again. When I tried that by doing: votable = parse(xml_file), it raises an OSError: File name too long as it takes all the file as a string.
from urllib.request import urlopen
fltr = 'http://svo2.cab.inta-csic.es/theory/fps/fps.php?ID=2MASS/2MASS.H'
url = urlopen(fltr).read()
xml_file = url.decode('UTF-8')
with open('tmp.xml','w') as out:
out.write(xml_file)
votable = parse('tmp.xml')
data = votable.get_first_table().to_table(use_names_over_ids=True)
print(votable)
print(data["Wavelength"])
The output in this case is:
<VOTABLE>... 1 tables ...</VOTABLE>
Wavelength
AA
----------
12890.0
13150.0
...
18930.0
19140.0
Length = 58 rows
Indeed according to the API documentation, votable.parse
's first argument is either a filename or a readable file-like object. It doesn't specify this exactly, but apparently the file also has to be seekable meaning that it can be read with random access.
The HTTPResponse
object returned by urlopen
is indeed a file-like object with a .read()
method, so in principle it might be possible to pass directly to parse()
, but this is how I found out it has to be seekable:
fltr = 'http://svo2.cab.inta-csic.es/theory/fps/fps.php?ID=2MASS/2MASS.H'
u = urlopen(fltr)
>>> parse(u)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "astropy/io/votable/table.py", line 135, in parse
_debug_python_based_parser=_debug_python_based_parser) as iterator:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "astropy/utils/xml/iterparser.py", line 157, in get_xml_iterator
with _convert_to_fd_or_read_function(source) as fd:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "astropy/utils/xml/iterparser.py", line 63, in _convert_to_fd_or_read_function
with data.get_readable_fileobj(fd, encoding='binary') as new_fd:
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "astropy/utils/data.py", line 210, in get_readable_fileobj
fileobj.seek(0)
io.UnsupportedOperation: seek
So you need to wrap the data in a seekable file-like object. Along the lines that @keflavich wrote you can use io.BytesIO
(io.StringIO
won't work as explained below).
It turns out that there's no reason to explicitly decode the UTF-8 data to unicode. I'll spare the example, but after trying it myself it turns out parse()
works on raw bytes (which I find a bit odd, but okay). So you can read the entire contents of the URL into an io.BytesIO
which is just an in-memory file-like object that supports random access:
>>> u = urlopen(fltr)
>>> s = io.BytesIO(u.read())
>>> v = parse(s)
WARNING: W42: None:2:0: W42: No XML namespace specified [astropy.io.votable.tree]
>>> v.get_first_table().to_table(use_names_over_ids=True)
<Table masked=True length=58>
Wavelength Transmission
AA
float32 float32
---------- ------------
12890.0 0.0
13150.0 0.0
... ...
18930.0 0.0
19140.0 0.0
This is, in general, the way in Python to do something with some data as though it were a file, without writing an actual file to the filesystem.
Note, however, this won't work if the entire file can't fit in memory. In that case you still might need to write it out to disk. But if it's just for some temporary processing and you don't want to litter your disk tmp.xml
like in your example, you can always use the tempfile
module to, among other things, create temporary files that are automatically deleted once they're no longer in use.