Search code examples
pandascrosstab

How can I flatten the output dataframe of pandas crosstab from two series x and y into a series?


I have the following series x and y:

x = pd.Series(['a', 'b', 'a', 'c', 'c'], name='x')
y = pd.Series([1, 0, 1, 0, 0], name='y')

I call pd.crosstab to get the following dataframe as output:

pd.crosstab(x, y)

Output:

y   0   1
x       
a   0   2
b   1   0
c   2   0

I want to transform this into a single series as follows:

x_a_y_0 0
x_a_y_1 2
x_b_y_0 1
x_b_y_1 0
x_c_y_0 2
x_c_y_1 0

For a specific dataframe like this one, I can construct this by visual inspection:

pd.Series(
    dict(
        x_a_y_0=0, 
        x_a_y_1=2,
        x_b_y_0=1,
        x_b_y_1=0,
        x_c_y_0=2,
        x_c_y_1=0
    )
)

But given arbitrary series x and y, how do I generate the corresponding final output?


Solution

  • Use DataFrame.stack with change MultiIndex by map:

    s = pd.crosstab(x, y).stack()
    s.index = s.index.map(lambda x: f'x_{x[0]}_y_{x[1]}')
    print (s)
    x_a_y_0    0
    x_a_y_1    2
    x_b_y_0    1
    x_b_y_1    0
    x_c_y_0    2
    x_c_y_1    0
    dtype: int64
    

    Also is possible pass s.index.names, thank you @SeaBean:

    s.index = s.index.map(lambda x: f'{s.index.names[0]}_{x[0]}_{s.index.names[1]}_{x[1]}')