Both Numpy and Scipy have a number of useful functions for performing operations on data (i.e. integrating, Fourier transforms, baseline correction, etc.). However, I haven't seen documentation regarding the general form for inputing X-Y data into these functions. Say I have a spectrum of wavelength and absorbance values, or stress and strain data from a mechanical properties test.
Does one generally:
Use two 1-D Numpy arrays, one for X, and one for Y?
Use one 2-D Numpy array, with X on one axis, and Y on the other?
Use a single structured array?
How does this change when you have XY-Z data?
What is the most general data structure for XY data that allows me to input my data directly into most of these functions without redefining how I store my data?
Check the documentation for each package and operational class or function. scipy
is a collection of packages, written by different people, and often serving a interfaces to even older Fortran or C packages. So the input format is constrained by those sources. And they also depend on what is suitable for the problem.
Often it is convenient to generate values on a regular grid. For example use np.meshgrid
or np.mgrid
with arange
or linspace
values to define a 2d space. The result can be 3 2d arrays - the x
and y
values, and the z
as a function of those.
But realworld data is often available as scatter points. Each point is then a x
, y
location with a z
value. You can't cast those as 2d arrays, at least not without interpolation. So three 1d arrays is the appropriate representation. Or a (n, 3)
matrix, one column for each of the variables. Or if the values have different dtype - say integer for x and y, float for z, then a structured array with 3 fields.
Often data is loaded from csv files - the columns representing those x,y,z
values, maybe with string labels, and multiple z
values. With a mix of data types they are often loaded with genfromtxt
, resulting in a 1d structured array.
It's easy to map from structured arrays to multiple arrays with uniform dtype. Sometimes you do this by just indexing with the field name, other cases might require a view
.
To delve into this more you might need to expand on the data type(s), and the packages that you need to use.
http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata. interpolate.griddata
illustrates the use of both point data and grid data.