Search code examples
pythonpandasplotmosaic

mosaic plot with percentage and count values as labels in pandas DF


I have pandas dataframe like this:

     LEVEL_1      LEVEL_2    Freq  Percentage
0       HIGH          HIGH   8842      17.684
1    AVERAGE           LOW   2802       5.604
2        LOW           LOW  22198      44.396
3    AVERAGE       AVERAGE   6804      13.608
4        LOW       AVERAGE   2030       4.060
5       HIGH       AVERAGE   3666       7.332
6    AVERAGE          HIGH   2887       5.774
7        LOW          HIGH    771       1.542

I can get tiles of LEVEL_1 and LEVEL_2:

 from statsmodels.graphics.mosaicplot import mosaic
 mosaic(df, ['LEVEL_1','LEVEL_2'])

enter image description here
I just want to put Freq and Percentage at the center of each tile of mosaic plot. How can I do this?


Solution

  • Here's a start. Note I had to add a row of zeros to the DataFrame for the labeling. You can make the labeling nicer by string formatting in the lambda function. You'll also want to reorder the headers.

    import pandas as pd
    from statsmodels.graphics.mosaicplot import mosaic
    import io
    d = io.StringIO()
    d.write("""     LEVEL_1      LEVEL_2    Freq  Percentage\n
           HIGH          HIGH   8842      17.684\n
        AVERAGE           LOW   2802       5.604\n
            LOW           LOW  22198      44.396\n
        AVERAGE       AVERAGE   6804      13.608\n
            LOW       AVERAGE   2030       4.060\n
           HIGH       AVERAGE   3666       7.332\n
        AVERAGE          HIGH   2887       5.774\n
            LOW          HIGH    771       1.542""")
    d.seek(0)
    df = pd.read_csv(d, skipinitialspace=True, delim_whitespace=True)
    df = df.append({'LEVEL_1': 'HIGH', 'LEVEL_2': 'LOW', 'Freq': 0, 'Percentage': 0}, ignore_index=True)
    df = df.sort_values(['LEVEL_1', 'LEVEL_2'])
    df = df.set_index(['LEVEL_1', 'LEVEL_2'])
    print(df)
    
    mosaic(df['Freq'], labelizer=lambda k: df.loc[k].values);
    

    plot from a Jupyter notebook