Search code examples
pythonnumpygenfromtxt

NumPy genfromxt TypeError: data type not understood error


I would like to read in this file (test.txt)

01.06.2015;00:00:00;0.000;0;-9.999;0;8;0.00;18951;(SPECTRUM)ZERO(/SPECTRUM)
01.06.2015;00:01:00;0.000;0;-9.999;0;8;0.00;18954;(SPECTRUM)ZERO(/SPECTRUM)
01.06.2015;00:02:00;0.000;0;-9.999;0;8;0.00;18960;(SPECTRUM)ZERO(/SPECTRUM)
01.06.2015;09:23:00;0.327;61;25.831;39;29;0.18;19006;01.06.2015;09:23:00;0.327;61;25.831;39;29;0.18;19006;(SPECTRUM);;;;;;;;;;;;;;1;1;;;1;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;1;;;;;;;;;;;;(/SPECTRUM)
01.06.2015;09:24:00;0.000;0;-9.999;0;29;0.00;19010;(SPECTRUM)ZERO(/SPECTRUM)

...I tried it with the numpy function genfromtxt() (see below in the code excerpt).

import numpy as np
col_names = ["date", "time", "rain_intensity", "weather_code_1", "radar_ref", "weather_code_2", "val6", "rain_accum", "val8", "val9"]
types = ["object", "object", "float", "uint8", "float", "uint8", "uint8", "float", "uint8","|S10"]
# Read in the file with np.genfromtxt
mydata = np.genfromtxt("test.txt", delimiter=";", names=col_names, dtype=types)

Now when I execute the code I get the following error -->

raise ValueError(errmsg)ValueError: Some errors were detected !
    Line #4 (got 79 columns instead of 10)

Now I think that the difficulties come from the last column (val9) with the many ;;;;;;;
It is obvious that the delimeters and the signs in the last column; are the same!

How can I read in the file without an error, maybe there is a possibility to skip the last column, or to replace the ; only in the last column?


Solution

  • usecols can be used to ignore excess delimiters, e.g.

    In [546]: np.genfromtxt([b'1,2,3',b'1,2,3,,,,,,'], dtype=None,
        delimiter=',', usecols=np.arange(3))
    Out[546]: 
    array([[1, 2, 3],
           [1, 2, 3]])