Search code examples
pythonnumpyparsingfile-processing

Converting a list of ints or floats read from a file to a numpy array


I have bunch of int values that I have to read from a file and store it in a numpy array. This is how I am doing it:

el_connect = np.zeros([NEls,3],dtype=int)
for i in range(0,NEls):
    connct = file.readline().strip().split()
    for j in range(0,len(connct)):
        el_connect[i,j] = int(connct[j])

This is how I am currently doing it. Is there a better way to do it where I can eliminate the second for loop?

Other questions I have regarding this are:

  1. How can I deal with the scenario where certain columns are ints and other columns are floats because numpy arrays cannot handle multiple data types?

  2. Also, how can I throw exception if the format of the file is not as I expected? Just a few examples would do.


Solution

  • You could get rid of both loops by using np.genfromtxt() assuming space separated values (which I'm inferring from the above code).

    In []:
    data = '''1 2 2.2 3
    3 4 4.1 2'''
    
    np.genfromtxt(StringIO(data))     # Replace `StringIO(data)` with filename
    
    Out[]:
    array([[1. , 2. , 2.2, 3. ],
           [3. , 4. , 4.1, 2. ]])
    

    np.genfromtxt() infers np.float64 for the array if you have mixed set of ints and floats but if you want to explicitly describe the types you can with:

    np.genfromtxt(StringIO(data), [np.int32, np.int32, np.float, np.int32])
    
    Out[]:
    array([(1, 2, 2.2, 3), (3, 4, 4.1, 2)],
          dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<f8'), ('f3', '<i4')])