Search code examples
pythonnumpygenfromtxt

Manipulating a numpy array


I currently have a csv file with approximately 350 lines and 50 columns, of which I want to access four columns. Using genfromtxt, I'm able to do this. Once I have those columns, however, I want to add a new column based off of the existing columns (i.e. newcol=abs(col1-col2)). When I do this, however, I get the error: too many indices for array.

Here is my code:

import numpy as np
thedata = np.genfromtxt(
    'match_roughgraphs.csv',
    skip_header=0,
    skip_footer=0,
    delimiter=',',
    usecols=(3,4,29,30),
    names=['hubblera','hubbledec','sloanra','sloandec'])

for row in thedata:
    print(row)

b=np.empty(350,1)
b=np.absolute(thedata[:,0]-thedata[:,1]) #returns too many indices error

print(thedata[0,0]) #also returns too many indices error

print(thedata[0]) #prints out first row

Based on last two lines above, a test I tried, I'm assuming genfromtxt() is loading the csv file so that all the data are saved in one column, separated by string commas instead of delimiter commas. Any suggestions on how to fix this?


Solution

  • I think the reason your code does not work is that numpy.genfromtxt returns a 1D array of tupples or more specifically structured ndarray. read this numpy.genfromtxt produces array of what looks like tuples, not a 2D array—why? So you can fix the arguments or convert those tupples to array to get the thedata as a 2D array. When you use the name argument it returns a structured ndarray, remove this argument and it will return a 2D array. As you have named column here you can just do

    b=np.absolute(thedata['hubblera']-thedata['hubbledec'])
    

    also thedata[0,0]this returns an error because there is no 2D array try doing thedata[0][0]