python arrays numpy concatenation genfromtxt

Creating numpy array with empty columns using genfromtxt

I am importing data using numpy.genfromtxt, and I would like to add a field of values derived from some of those within the dataset. As this is a structured array, it seems like the most simple, efficient way of adding a new column to the array is by using numpy.lib.recfunctions.append_fields(). I found a good description of this library HERE.

Is there a way of doing this without copying the array, perhaps by forcing genfromtxt to create an empty column to which I can append derived values?

Solution

Here's a simple example using a generator to add a field to a data file using genfromtxt

Our example data file will be data.txt with the contents:

1,11,1.1
2,22,2.2
3,33,3.3

In [19]: np.genfromtxt('data.txt',delimiter=',')
Out[19]:
array([[  1. ,  11. ,   1.1],
       [  2. ,  22. ,   2.2],
       [  3. ,  33. ,   3.3]])

If we make a generator such as:

def genfield():
    for line in open('data.txt'):
        yield '0,' + line

which prepends a comma-delimited 0 to each line of the file, then:

In [22]: np.genfromtxt(genfield(),delimiter=',')
Out[22]:
array([[  0. ,   1. ,  11. ,   1.1],
       [  0. ,   2. ,  22. ,   2.2],
       [  0. ,   3. ,  33. ,   3.3]])

You can do the same thing with comprehensions as follows:

In [26]: np.genfromtxt(('0,'+line for line in open('data.txt')),delimiter=',')
Out[26]:
array([[  0. ,   1. ,  11. ,   1.1],
       [  0. ,   2. ,  22. ,   2.2],
       [  0. ,   3. ,  33. ,   3.3]])