Search code examples
pythonnumpypandasdata-structuresdata-analysis

Best data structure to use in python to store a 3 dimensional cube of named data


I would like some feedback on my choice of data structure. I have a 2D X-Y grid of current values for a specific voltage value. I have several voltage steps and have organized the data into a cube of X-Y-Voltage. I illustrated the axes here: https://i.sstatic.net/iS7tW.jpg.

I currently use numpy arrays in python dictionaries for the different kind of transistors I am sweeping. I'm not sure if this is the best way to do this. I've looked at Pandas, but am also not sure if this is a good job for Pandas. Was hoping someone could help me out, so I could learn to be pythonic! The code to generate some test data and the end structure is below.

Thank you!

import numpy as np

#make test data

test__transistor_data0 = {"SNMOS":np.random.randn(3,256,256),"SPMOS":np.random.randn(4,256,256), "WPMOS":np.random.randn(6,256,256),"WNMOS":np.random.randn(6,256,256)}
test__transistor_data1 = {"SNMOS":np.random.randn(3,256,256), "SPMOS":np.random.randn(4,256,256), "WPMOS":np.random.randn(6,256,256), "WNMOS":np.random.randn(6,256,256)}
test__transistor_data2 = {"SNMOS":np.random.randn(3,256,256), "SPMOS":np.random.randn(4,256,256), "WPMOS":np.random.randn(6,256,256), "WNMOS":np.random.randn(6,256,256)}
test__transistor_data3 = {"SNMOS":np.random.randn(3,256,256), "SPMOS":np.random.randn(4,256,256), "WPMOS":np.random.randn(6,256,256), "WNMOS":np.random.randn(6,256,256)}


quadrant_data = {"ne":test__transistor_data0,"nw":test__transistor_data1,"sw":test__transistor_data2,"se":test__transistor_data3} 

Solution

  • It may be worth checking out xarray, which is like (and partially based on) pandas, but designed for N-dimensional data.

    Its two fundamental containers are a DataArray which is a labeled ND array, and a a Dataset, which is a container of DataArrays.

    In [29]: s1 = xray.DataArray(np.random.randn(3,256,256), dims=['voltage', 'x', 'y'])
    
    In [30]: s2 = xray.DataArray(np.random.randn(3,256,256), dims=['voltage', 'x', 'y'])
    
    In [32]: ds = xray.Dataset({'SNMOS': s1, 'SPMOS': s2})
    
    In [33]: ds
    Out[33]: 
    <xray.Dataset>
    Dimensions:  (voltage: 3, x: 256, y: 256)
    Coordinates:
      * voltage  (voltage) int64 0 1 2
      * x        (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
      * y        (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
    Data variables:
        SPMOS    (voltage, x, y) float64 -1.363 2.446 0.3585 -0.8243 -0.814 ...
        SNMOS    (voltage, x, y) float64 1.07 2.327 -1.435 0.4011 0.2379 2.07 ...
    

    Both containers have a lot of nice functionality (see the docs), for example, if you wanted to know max value of x for each transitor, at the first voltage level, it'd be something like this:

    In [39]: ds.sel(voltage=0).max(dim='x').max()
    Out[39]: 
    <xray.Dataset>
    Dimensions:  ()
    Coordinates:
        *empty*
    Data variables:
        SPMOS    float64 4.175
        SNMOS    float64 4.302