Search code examples
pandasdataframemulti-index

Create pandas MultiIndex DataFrame from multi dimensional np arrays


I am trying to insert 72 matrixes with dimensions (24,12) from an np array into a preexisting MultiIndexDataFrame indexed according to a np.array with dimension (72,2). I don't care to index the content of the matrixes (24,12), I just need to index the 72 matrix even as objects for rearrangemnet purposes. It is like a map to reorder accroding to some conditions to then unstack the columns.

what I have tried so far is:

cosphi.shape

(72, 2)

MFPAD_RCR.shape

(72, 24, 12)

df = pd.MultiIndex.from_arrays(cosphi.T, names=("costheta","phi"))

I successfully create an DataFrame of 2 columns with 72 index row. Then I try to add the 72 matrixes

df1 = pd.DataFrame({'MFPAD':MFPAD_RCR},index=df)

or possibly

df1 = pd.DataFrame({'MFPAD':MFPAD_RCR.astype(object)},index=df)

I get the error

Exception: Data must be 1-dimensional. 

Any idea?


Solution

  • After a bot of careful research, I found that my question has been already answered here (the right answer) and here (a solution using a deprecated function).

    For my specific question, the answer is something like:

    data = MFPAD_RCR.reshape(72, 288).T
    df = pd.DataFrame(
        data=data,
        index=pd.MultiIndex.from_product([phiM, cosM],names=["phi","cos(theta)"]),
        columns=['item {}'.format(i) for i in range(72)]
    )
    

    Note: that the 3D np array has to be reshaped with the second dimension equal to the product of the major and the minor indexes.

    df1 = df.T
    

    I want to be able to sort my items (aka matrixes) according to extra indexes coming from cosphi

    cosn=np.array([col[0] for col in cosphi]); #list
    phin=np.array([col[1] for col in cosphi]); #list
    

    Note: the length of the new indexes has to be the same as the items (matrixes) = 72

    df1.set_index(cosn, "cos_ph", append=True, inplace=True)
    df1.set_index(phin, "phi_ph", append=True, inplace=True)
    

    And after this one can sort

    df1.sort_index(level=1, inplace=True, kind="mergesort")
    

    and reshape

    outarray=(df1.T).values.reshape(24,12,72).transpose(2, 0, 1)
    

    Any suggestion to make the code faster / prettier is more than welcome!