Reading an online .tbl data file in python

Like the Title says, I am trying to read an online data file that is in .tbl format. Here is the link to the data: https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl

I tried the following code

cosmos= pd.read_table('https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl')

Running this didn't give me any errors however when I wrote print (cosmos.column), it didn't give me a list of individuals columns instead python put everything together and gave me the output that looks like:

Index(['|            ID|            RA|           DEC|  MAG_AUTO_ACS|       R_PETRO|        R_HALF|    CONC_PETRO|     ASYMMETRY|          GINI|           M20|   Axial Ratio|     AUTOCLASS|   CLASSWEIGHT|'], dtype='object').

My main goal is to print the columns of that table individually and then print cosmos['RA']. Anyone know how this can be done?

Solution

Your file has four header rows and different delimiters in header (|) and data (whitespace). You can read the data by using skiprows argument of read_table.

import requests
import pandas as pd

filename = 'cosmos_morph_cassata_1.1.tbl'
url = 'https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/' + filename
n_header = 4

## Download large file to disc, so we can reuse it...
table_file = requests.get(url)
open(filename, 'wb').write(table_file.content)


## Skip the first 4 header rows and use whitespace as delimiter
cosmos = pd.read_table(filename, skiprows=n_header, header=None, delim_whitespace=True)

## create header from first line of file
with open(filename) as f:
    header_line = f.readline()
    ## trim whitespaces and split by '|'
    header_columns = header_line.replace(' ', '').split('|')[1:-1]

cosmos.columns = header_columns