Search code examples
pythonpandasdataframerandommulti-level

Random Sampling a Multi-level column


level column DataFrame that looks like this:

df

Solid             Liquid                Gas
pen paper pipe    water juice milk      oxygen nitrogen helium
5   2     1       4     3     1         7      8        10
5   2     1       4     3     1         7      8        10
5   2     1       4     3     1         7      8        10
4   4     7       3     2     0         6      7        9
3   7     9       4     6     5         3      3        4

What I wanted was to randomly choose 2 columns among "Solid", "Liquid", and "Gas" with 3 sub-columns with them.

for example if Solid and Gas were to randomly selected, the expected result should be:

Solid             Gas
pen paper pipe    oxygen nitrogen helium
5   2     1       7      8        10
5   2     1       7      8        10
5   2     1       7      8        10
4   4     7       6      7        9
3   7     9       3      3        4

I have tried this code but it did not give me the same result.

result = df.sample(n=5, axis=1)
result

[output]

Solid    Gas
pipe     oxygen
1        7
1        7
1        7
1        7
7        6
9        3

Can anyone please help me figure this one out? Thank you :)


Solution

  • You can sample the first level columns and then select the sampled columns:

    df[pd.Series(df.columns.levels[0]).sample(2)]
    

    Or use the random.sample function:

    import random
    df[random.sample(df.columns.levels[0].tolist(),2)]