I have a .csv file with 24columns x 514rows of data. Each of these column represent different parameters and I wish to study the trends between different parameters.
I am using genfromtxt to import the data as a numpy array such that I can plot the values of two particular columns(e.g. column 9 against column 11). Here is what I have so far:
import matplotlib.pyplot as plt
import numpy as np
data = np.genfromtxt('output_burnin.csv', delimiter=',')
impactparameter=data[:,11]
planetradius=data[:,9]
plt.plot(planetradius,impactparameter,'bo')
plt.title('Impact Parameter vs. Planet Radius')
plt.xlabel('R$_P$/R$_Jup$')
plt.ylabel('b/R$_star$')
plt.show()
With this code I encounter an error at line 12:
impactparameter=data[:,11]
IndexError: too many indices
What could the problem be in here?
Also, I have been trying to figure out how to give each column a header in the .csv file. So instead of counting the column number, I can just call the name of that particular column when I do the plotting. Is there a way to do this?
I am a complete newbie in Python, any help would be much appreciated, Thanks!
Also, I have been trying to figure out how to give each column a header in the .csv file. So instead of counting the column number, I can just call the name of that particular column when I do the plotting. Is there a way to do this?
To give columns in your array names, you need to make it a structured array.
Here's a simple example:
a = np.zeros(5, dtype='f4, f4, f4')
a.dtype.names = ('col1', 'col2', 'col3')
print a[0] # prints [0, 0, 0], the first row (record)
print a['col1'] # prints [0, 0, 0, 0, 0], the first column
If you have the column names at the beginning of your CSV file, and set names=True
in np.genfromtxt
, then Numpy will automatically create a structured array for you with the correct names.