Search code examples
pythonpython-2.7numpyscientific-computing

How to store X-Y data in order to use the functions in Numpy?


Both Numpy and Scipy have a number of useful functions for performing operations on data (i.e. integrating, Fourier transforms, baseline correction, etc.). However, I haven't seen documentation regarding the general form for inputing X-Y data into these functions. Say I have a spectrum of wavelength and absorbance values, or stress and strain data from a mechanical properties test.

Does one generally:

  1. Use two 1-D Numpy arrays, one for X, and one for Y?

  2. Use one 2-D Numpy array, with X on one axis, and Y on the other?

  3. Use a single structured array?

How does this change when you have XY-Z data?

What is the most general data structure for XY data that allows me to input my data directly into most of these functions without redefining how I store my data?


Solution

  • Check the documentation for each package and operational class or function. scipy is a collection of packages, written by different people, and often serving a interfaces to even older Fortran or C packages. So the input format is constrained by those sources. And they also depend on what is suitable for the problem.

    Often it is convenient to generate values on a regular grid. For example use np.meshgrid or np.mgrid with arange or linspace values to define a 2d space. The result can be 3 2d arrays - the x and y values, and the z as a function of those.

    But realworld data is often available as scatter points. Each point is then a x, y location with a z value. You can't cast those as 2d arrays, at least not without interpolation. So three 1d arrays is the appropriate representation. Or a (n, 3) matrix, one column for each of the variables. Or if the values have different dtype - say integer for x and y, float for z, then a structured array with 3 fields.

    Often data is loaded from csv files - the columns representing those x,y,z values, maybe with string labels, and multiple z values. With a mix of data types they are often loaded with genfromtxt, resulting in a 1d structured array.

    It's easy to map from structured arrays to multiple arrays with uniform dtype. Sometimes you do this by just indexing with the field name, other cases might require a view.

    To delve into this more you might need to expand on the data type(s), and the packages that you need to use.

    http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata. interpolate.griddata illustrates the use of both point data and grid data.