Search code examples
pythonarraysnumpyfits

How to change the dtype of a numpy recarray when one of the columns is an array?


In previous posts I've seen that changing dtype of a recarray can be performed using astype. However I cannot manage to do it with a recarray which has an array in one of its columns.

My recarray comes from a FITS file record:

> f = fits.open('myfile.fits')   
> tbdata = f[1].data
> tbdata
# FITS_rec([ (0.27591679999999996, array([570, 576, 566, ..., 571, 571, 569], dtype=int16)),
#   (0.55175680000000005, array([575, 563, 565, ..., 572, 577, 582], dtype=int16)),
#   ...,
#   (2999.2083967999997, array([574, 570, 575, ..., 560, 551, 555], dtype=int16)),
#   (2999.4842367999995, array([575, 583, 578, ..., 559, 565, 568], dtype=int16)], 
#   dtype=[('TIME', '>f8'), ('AC', '>i4', (2,))])

I need to convert AC column from int to float so I've tried:

> tbdata = tbdata.astype([('TIME', '>f8'), ('AC', '>f4', (2,))])

and, although it seems that dtype has indeed changed

> tbdata.dtype
# dtype([('TIME', '>f8'), ('AC', '>f4', (2,))])

a look to the data in AC shows that they are still integer values. For instance, a sum calculation reaches the limits of the int16 variable (all the AC column values are positive):

> tbdata['AC'][0:55].sum()
# _VLF(array([31112, 31128, 31164, ..., 31203, 31232, 31262], dtype=int16), dtype=object)
> tbdata['AC'][0:65].sum()
# _VLF(array([-28766, -28759, -28702, ..., -28659, -28638, -28583], dtype=int16), dtype=object)

Is there any way to effectively change the array data type?


Solution

  • I can reproduce this issue with a recarray from a fits file. A workaround is to load the recarray as a fits table, and then transform it into a pandas dataframe:

    from astropy.table import Table
    import pandas as pd
    
    t = Table.read('file.fits')
    df = pd.DataFrame.from_records(t, columns=t.columns) 
    df.AC = df.AC.astype(float)