Search code examples
pythonmatlabcell-array

What is the equivalent to a Matlab cell array?


I am new to Python and trying to create something equivalent to Matlab's "cell array". Lets say I have 100 customers index 'C001', 'C002' etc. and I have different data for each customer:

  • Size of premises in square meters [real number]
  • categorical data showing whether they are 'commercial', 'residential' or 'other'
  • hourly time series of their electricity consumption in 2014 i.e. datetime-indexed array of 8760 real values

What is the best way to buildsuch a dataset in Python 2.7 that combines single values, categorical data and time-index arrays? I am trying to use pandas for this but no success so far.

Thank you very much in advance


Solution

  • The equivalent of a MATLAB cell array is a numpy object array. However, these are rarely used because they are rarely what you want in practice. In most cases where someone would use a Cell in MATLAB, a list or nested list would suffice:

    >>> a = [obj1, obj2, obj, obj4]
    >>> b = [[obj1, obj2], [obj3, obj4]]
    

    However, that is not what you want to do in your case. Your question is a classic example of X Y problem. You are asking how implement a particular solution to your problem, rather than asking how to solve the problem itself. Python can do a lot of things MATLAB can't, so trying to make Python behave like MATLAB will often result in sub-optimal solutions.

    In this case, what you want is a pandas DataFrame. It is nothing at all like a MATLAB cell array, but fits your data set much better. You can use a MultiIndex to store the parameters, and columns to store the time series data. This allows you to index by name, size, category, date, etc. You can calculate, for example, the mean energy usage for each category of property in the third quarter for properties over 500 square meters in just one line of code.

    So here is an example how you could structure such data:

    >>> names = ['C001', 'C002', 'C003', 'C004']
    >>> sizes = np.abs(np.random.random(4))*1000
    >>> category = ['Commerical', 'Residential', 'Residential', 'Other']
    >>> ts = np.random.random([100, 4])
    >>> timestamps = pd.date_range('1/1/2011', periods=100, freq='W') 
    >>> 
    >>> cols = pd.MultiIndex.from_arrays([names, sizes, category])
    >>> 
    >>> df = pd.DataFrame(ts, index=timestamps, columns=cols)
    >>> df.columns.names = ['Name', 'Size', 'Category']
    >>> df.index.name = 'Time'
    >>> 
    >>> print(df)
    Name             C001        C002        C003       C004
    Size       36.719201   732.278278  795.755755 551.383120
    Category   Commerical Residential Residential      Other
    Time                                                    
    2011-01-02   0.108720    0.018492    0.057233   0.694548
    2011-01-09   0.959845    0.968857    0.422210   0.975767
    2011-01-16   0.709676    0.119963    0.004481   0.830328
    2011-01-23   0.084271    0.535408    0.209943   0.668001
    2011-01-30   0.626125    0.052301    0.212636   0.995429
    2011-02-06   0.376399    0.199327    0.482884   0.632472
    2011-02-13   0.302807    0.353679    0.599427   0.993996
    2011-02-20   0.185445    0.005769    0.755981   0.923540
    2011-02-27   0.109611    0.994292    0.873782   0.542741
    2011-03-06   0.561404    0.778414    0.595238   0.082001
    2011-03-13   0.056986    0.869344    0.459753   0.450071
    2011-03-20   0.261320    0.675317    0.603043   0.371950
    2011-03-27   0.890803    0.061619    0.831677   0.801890
    2011-04-03   0.498199    0.846559    0.370336   0.225477
    2011-04-10   0.248914    0.693038    0.145255   0.233058
    2011-04-17   0.621441    0.683213    0.048944   0.650139
    2011-04-24   0.459869    0.055751    0.912097   0.457605
    2011-05-01   0.814447    0.780415    0.184241   0.429139
    2011-05-08   0.586905    0.209121    0.428080   0.246584
    2011-05-15   0.754021    0.909181    0.846984   0.948835
    2011-05-22   0.513610    0.203925    0.338072   0.596325
    2011-05-29   0.497080    0.557908    0.916812   0.680242
    2011-06-05   0.646791    0.641024    0.399427   0.308346
    2011-06-12   0.573922    0.539285    0.098703   0.461480
    2011-06-19   0.062978    0.939339    0.713087   0.380326
    2011-06-26   0.422484    0.109185    0.459734   0.800468
    2011-07-03   0.962368    0.632361    0.388565   0.503425
    2011-07-10   0.802551    0.261161    0.590494   0.526307
    2011-07-17   0.261447    0.686405    0.636970   0.622476
    2011-07-24   0.634331    0.630028    0.069925   0.504036
    ...               ...         ...         ...        ...
    2012-05-06   0.185331    0.375717    0.658463   0.697377
    2012-05-13   0.273510    0.665318    0.756944   0.083542
    2012-05-20   0.895984    0.850881    0.680869   0.987420
    2012-05-27   0.450593    0.262195    0.458893   0.199141
    2012-06-03   0.696102    0.332312    0.419764   0.338074
    2012-06-10   0.113108    0.167605    0.812625   0.329429
    2012-06-17   0.527418    0.087454    0.868973   0.744649
    2012-06-24   0.977674    0.831538    0.410719   0.598423
    2012-07-01   0.577802    0.141307    0.310356   0.276271
    2012-07-08   0.772117    0.288240    0.820701   0.548857
    2012-07-15   0.699628    0.467952    0.429433   0.304482
    2012-07-22   0.782641    0.337854    0.561191   0.572241
    2012-07-29   0.010225    0.962770    0.793041   0.166877
    2012-08-05   0.895516    0.628526    0.782264   0.908301
    2012-08-12   0.787210    0.698185    0.255306   0.741693
    2012-08-19   0.042833    0.556469    0.165885   0.408108
    2012-08-26   0.942076    0.377714    0.927170   0.119004
    2012-09-02   0.567978    0.007891    0.777752   0.869950
    2012-09-09   0.120134    0.417996    0.328654   0.484447
    2012-09-16   0.833769    0.946456    0.594471   0.569707
    2012-09-23   0.515544    0.090017    0.344200   0.498175
    2012-09-30   0.419152    0.315412    0.683195   0.498630
    2012-10-07   0.879582    0.958591    0.531812   0.051948
    2012-10-14   0.488241    0.683242    0.096560   0.197295
    2012-10-21   0.425213    0.279539    0.476436   0.492512
    2012-10-28   0.238334    0.836782    0.901589   0.132700
    2012-11-04   0.030562    0.797666    0.238895   0.550427
    2012-11-11   0.875454    0.973046    0.457116   0.154175
    2012-11-18   0.557967    0.895320    0.478239   0.448102
    2012-11-25   0.075152    0.047344    0.650615   0.293129
    
    [100 rows x 4 columns]