Search code examples
python-3.xastropy

Missing values in .dat file(empty) leading to error while reading the file


I have a .dat file which I tried to do analysis upon. This is the code

catalog=ascii.read("table6.dat",Reader=ascii.NoHeader,guess=False,fast_reader=False,delimiter='\s')

The problem is that there are missing values(empty) within the file which does not allow me to do analysis on the data.

output:

astropy.io.ascii.core.InconsistentTableError: Number of header columns (23) inconsistent with data columns (24) at data line 3

changing the delimiter from '\s' to '\n' gives me this

                                                                 col1                                                                
-------------------------------------------------------------------------------------------------------------------------------------
  1  33 Psc           28  00 05 20.1  -05 42 27   93.73  -65.93  111    -6.6    -13    89  (44)    -3   45 -101   -16.7   37.4   24.6
  2  ADS 48A          38  00 05 41.2   45 48 35  114.64  -16.32   11    -9.0    886  -207 (737)    -4   10   -3   -33.6  -31.1  -15.4
  3   5 Cet          352  00 08 12.0  -02 26 52   98.32  -63.23  140    -0.4      6    -4  (77)    -9   62 -125    -2.1   -4.1   -1.4
  4  BD Cet         1833  00 22 46.7  -09 13 49  100.84  -70.86   71    -4.8      3   -51 (409)    -4   23  -67     8.1  -15.9   -0.9
  5  13 Cet A       3196  00 35 14.8  -03 35 34  112.87  -66.15   21    10.6    410   -21 (409)    -3    8  -19   -36.0  -19.3  -12.7
  6  FF And               00 42 47.3   35 32 50  120.95  -27.29   24    -0.5    250    90 (380)   -11   18  -11   -26.3  -11.6    8.6
  7  zeta And       4502  00 47 20.3   24 16 02  121.73  -38.60   31   -23.7   -100   -83 (737)   -13   21  -19    26.5  -14.0    5.2
  8  CF Tuc         5303  00 52 58.3  -74 39 07  302.81  -42.48   54     0.5     19    28 (409)    22  -33  -36    -6.6    1.0   -5.5
  9  BD+25 161      6286  01 04 07.1   26 35 13  126.44  -36.20   55   -20.0    -12   -18 (737)   -26   36  -32    13.7  -13.5    7.7
 10  AY Cet         7672  01 16 36.2  -02 30 01  137.72  -64.65   67   -30.1   -108   -59 (409)   -21   19  -60    46.6   -2.7   15.6
                                                                                                                                  ...
196  IM Peg       216489  22 53 02.3   16 50 28   86.36  -37.48   50   -12.8    -19   -24 (737)     3   40  -30     6.3  -11.9    6.0
197  AZ Psc       217188  22 58 52.7   00 18 58   73.71  -51.46  260   -20.5     39    16 (409)    45  156 -203   -54.2  -12.3    5.5
198  TZ PsA       217344  23 00 27.7  -33 44 34   10.64  -65.25   46    36.9    -44  -132 (409)    19    4  -42    32.1  -21.4  -28.2
199  KU Peg       218153  23 05 29.3   26 00 33   95.03  -31.05  950   -80.4     51    -9 (737)   -71  811 -490  -171.4 -159.1  -78.5
200  KZ And       218738  23 09 57.4   47 57 30  105.90  -11.53   23    -6.9    157    -5 (737)    -6   22   -5   -12.7  -12.2   -5.5
201  RT And               23 11 10.0   53 01 33  108.06   -6.92   95    20.0    -12   -18 (737)   -29   90  -11     1.5   20.8   -7.9
202  SZ Psc       219113  23 13 23.8   02 40 32   80.66  -51.96  125    12.0     12    29 (737)    13   76  -98   -13.5   17.2   -3.5
203  EZ Peg               23 16 53.4   25 43 09   97.58  -32.45   83   -27.2    -70    13 (409)    -9   69  -45    24.8  -10.9   28.1
204  lambda And   222107  23 37 33.9   46 27 29  109.90  -14.53   23     6.8    162  -421 (737)    -8   21   -6    -1.8   -6.7  -49.2
205  KT Peg       222317  23 39 31.0   28 14 47  104.22  -32.00   25    -3.1    299   226 (737)    -5   21  -13   -41.9   -6.0   13.8
206  II Peg       224085  23 55 04.0   28 38 01  108.22  -32.62   29   -18.1    574    27 (737)    -8   24  -16   -66.5  -48.1   -3.8

but the header cannot be separately allocated to the columns. there is a missing value in rows 6, 201, 203 in the third column(shown values). the problem could be solved if false values could be given to these missing empty fields.

I can't find any documentation relating to this...


Solution

  • The problem is that there is fundamentally no way for the table parser to unambiguously know where the column boundaries are for your data file. Your table data are in fixed-width format, meaning that each column lives within certain character bounds in each line. You need to specify those bounds in some way.

    This is documented here with examples: https://docs.astropy.org/en/latest/io/ascii/fixed_width_gallery.html#fixed-width-gallery

    If you can modify the file, the easiest way is to add a header line which tells the parser what the column boundaries are. For example:

    Col1   Col2    Col3 Col4
    ---- --------- ---- ----
     1.2   "hello"    1    a
     2.4 's worlds    2    2
    

    If you cannot modify the file itself, then you can explicitly specify the column starts and stops, as shown in the second example in this section: https://docs.astropy.org/en/latest/io/ascii/fixed_width_gallery.html#fixedwidthnoheader