Search code examples
pythonnumpyimportgenfromtxt

genfromtxt different datatypes


I am trying to import data from a text file with varying number of columns. I know that the first column will always be an int and subsequent cols will be floats in all files. How can I specify this explicitly using dtypes?

dtypes=[int,float,float,float...] #this number will change depending on the number of columns in the file

data=np.genfromtxt(file,dtype=dtypes,delimiter='\t',skip_header=11) #read in 
the data

Thanks


Solution

  • You could first read everything as floats and convert the array into a structured array after you know how many columns you have:

    ##input.txt:
    ##    1 1.4 5e23
    ##    2 2.3 5e-12
    ##    3 5.7 -1.3e-2
    
    import numpy as np
    
    data = np.genfromtxt('input.txt')
    print(data)
    print('-'*50)
    
    colw = data.shape[1]
    
    dtypes = [('col0', int)]+[('col{}'.format(i+1),float) for i in range(colw-1)]
    print(dtypes)
    print('-'*50)
    
    converted_data = np.array([tuple(r) for r in data], dtype = dtypes)
    
    print(converted_data)
    

    This gives the following output:

    [[  1.00000000e+00   1.40000000e+00   5.00000000e+23]
     [  2.00000000e+00   2.30000000e+00   5.00000000e-12]
     [  3.00000000e+00   5.70000000e+00  -1.30000000e-02]]
    --------------------------------------------------
    [('col0', <class 'int'>), ('col1', <class 'float'>), ('col2', <class 'float'>)]
    --------------------------------------------------
    [(1,  1.4,   5.00000000e+23) (2,  2.3,   5.00000000e-12)
     (3,  5.7,  -1.30000000e-02)]
    

    Tested on Python 3.5