Search code examples
pythonarraysnumpyconcatenationgenfromtxt

Creating numpy array with empty columns using genfromtxt


I am importing data using numpy.genfromtxt, and I would like to add a field of values derived from some of those within the dataset. As this is a structured array, it seems like the most simple, efficient way of adding a new column to the array is by using numpy.lib.recfunctions.append_fields(). I found a good description of this library HERE.

Is there a way of doing this without copying the array, perhaps by forcing genfromtxt to create an empty column to which I can append derived values?


Solution

  • Here's a simple example using a generator to add a field to a data file using genfromtxt

    Our example data file will be data.txt with the contents:

    1,11,1.1
    2,22,2.2
    3,33,3.3
    

    So

    In [19]: np.genfromtxt('data.txt',delimiter=',')
    Out[19]:
    array([[  1. ,  11. ,   1.1],
           [  2. ,  22. ,   2.2],
           [  3. ,  33. ,   3.3]])
    

    If we make a generator such as:

    def genfield():
        for line in open('data.txt'):
            yield '0,' + line
    

    which prepends a comma-delimited 0 to each line of the file, then:

    In [22]: np.genfromtxt(genfield(),delimiter=',')
    Out[22]:
    array([[  0. ,   1. ,  11. ,   1.1],
           [  0. ,   2. ,  22. ,   2.2],
           [  0. ,   3. ,  33. ,   3.3]])
    

    You can do the same thing with comprehensions as follows:

    In [26]: np.genfromtxt(('0,'+line for line in open('data.txt')),delimiter=',')
    Out[26]:
    array([[  0. ,   1. ,  11. ,   1.1],
           [  0. ,   2. ,  22. ,   2.2],
           [  0. ,   3. ,  33. ,   3.3]])