Like the Title says, I am trying to read an online data file that is in .tbl format. Here is the link to the data: https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl
I tried the following code
cosmos= pd.read_table('https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl')
Running this didn't give me any errors however when I wrote print (cosmos.column)
, it didn't give me a list of individuals columns instead python put everything together and gave me the output that looks like:
Index(['| ID| RA| DEC| MAG_AUTO_ACS| R_PETRO| R_HALF| CONC_PETRO| ASYMMETRY| GINI| M20| Axial Ratio| AUTOCLASS| CLASSWEIGHT|'], dtype='object').
My main goal is to print the columns of that table individually and then print cosmos['RA']
. Anyone know how this can be done?
Your file has four header rows and different delimiters in header (|
) and data (whitespace). You can read the data by using skiprows
argument of read_table
.
import requests
import pandas as pd
filename = 'cosmos_morph_cassata_1.1.tbl'
url = 'https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/' + filename
n_header = 4
## Download large file to disc, so we can reuse it...
table_file = requests.get(url)
open(filename, 'wb').write(table_file.content)
## Skip the first 4 header rows and use whitespace as delimiter
cosmos = pd.read_table(filename, skiprows=n_header, header=None, delim_whitespace=True)
## create header from first line of file
with open(filename) as f:
header_line = f.readline()
## trim whitespaces and split by '|'
header_columns = header_line.replace(' ', '').split('|')[1:-1]
cosmos.columns = header_columns