Search code examples
pythonpandasmulti-index

python: Create a multiindex pandas DF based on condition of column names


I have a dataframe as follows:

arrays = [np.array(["berlin", "berlin", "paris", "paris", "rome", "rome", "seville", "seville"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays, columns = ['mike','ana','manu','analia'])

it has a multiindex in the rows. I would like to transform that DF into another one that would also have a multindex in the columns.

the function could be generalized as this:

def sortit(colname):
    if colname.startswith('m'):
        return 'm'
    elif colname.startswith('m'): 
        return 'a'

The expected output is the following:

arrays = [np.array(["berlin", "berlin", "paris", "paris", "rome", "rome", "seville", "seville"]),
          np.array(["one", "two", "one", "two", "one", "two", "one", "two"])]
tuples_i = list(zip(*arrays))
index_rows = pd.MultiIndex.from_tuples(tuples_i, names=["city", "number"])
arrays2 = [np.array(["m","m", "a","a"]),
          np.array(['mike','manu','ana','analia'])]
tuples_c = list(zip(*arrays2))
print(tuples)
index_columns = pd.MultiIndex.from_tuples(tuples_c, names=["department", "name"])
df = pd.DataFrame(np.random.randn(8, 4), index=index_rows, columns = index_columns)

df

two important notes.

My starting point is the dataframe with multindex in the rows and no multiindex in the cols. I can not change that.

The method that illustrates here to which multiindex belong every name (column name) is a straight forward one, as example, the reality is that that function is way more complicated and time consuming, that's why I would like to create once the multilevel col index to make queries later much faster.


Solution

  • Code

    you can make multi index easily from making tuple

    (df.set_axis(df.columns.map(lambda x: (x[0], x)), axis=1)
     .rename_axis(['department', 'name'], axis=1))
    

    output:

    department  m       a       m       a
    name        mike    ana     manu    analia
    berlin  one 0.6     -0.0    2.9     1.3
            two 1.3     0.4     0.0     -3.0
    paris   one -0.5    -0.8    0.4     0.0
            two -0.6    -1.0    0.5     0.3
    rome    one -1.5    0.2     -0.0    1.4
            two -1.5    -1.9    0.0     -0.0
    seville one -1.3    1.3     0.7     0.5
            two -0.2    -0.2    -0.7    0.4