Search code examples
pythonlinuxdownloadpicklebinaryfiles

Generate many curl commands through binary file


I have a binary data file called image_info_binary.data, and I'd like to download many FITS images based on the information in the lines of this file. If I load this file into Python with the pickle module and print a single element, I get this:

import pickle
with open('image_info_binary', 'rb') as f:
    img_info = pickle.load(f)
print(img_info[0])

Outputs this string:

Object #: 2000073.0
Counter #: 2
Scan ID: 0245
Frame #: 167
Band #: 3
Image Link: http://....fits... #long url

There are about 50,000 of these elements, each with different object #, counter #, fits image URL, etc. I would like to go through each of these elements and download each FITS image as: {int(object number)}_{three digit counter}_w{band}.fits.

For example, I would want the downloaded image of the above example to be 2000073_002_w3.fits.

What is the best way to do this? I know if I was just downloading one image I could simply execute curl -o 2000073_002_w3.fits "url", for example. I'm not sure if generating many of these curl statements is the best way to do this or not. If I could just run a command in the terminal, that'd be great, but I could also use Python (but I feel like a subprocess would probably be slow). Thank you!


Solution

  • You can generate the URLs by iterating over the objects and splitting them into parts.

    for img in img_info:
        attr = dict()
        for line in img.split('\n'):
            key, value = line.split(': ', 1)
            attr[key] = value
        filename = '{0}_{1:03}_w{2}.fits'.format(
            attr['Object #'], attr['Counter #'], attr['Band #'])
        url = attr['Image Link']
    

    You can then print these, or pass them to subprocess.run(['curl', '-o', filename, url], check=True) or download them natively in Python.