Search code examples
pythonplotplotlyparenttreemap

Aggregation calculation method for treemap in plotly.express - Python


Thanks by advance for people who will try to help me. This is the first time I ask a question as I have been struggling for days on this one! Eternal glory to the one helping me out with this!

Let me explain my problem with a few lines of codes and screens.

I want to create a treemap showing the growth of values between 2 dates. In order to be more precise, I want this treemap to: -Have squares that have a size proportional to a value x at date 2 AND be coloured according to a scale showing the growth of this value x from date 1 to date 2.

Let us consider the following example:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
import plotly



data = {'variable': ['a', 'b', 'c'],
        'parent': ['I', 'I', 'II'],
     'value_1': [1,4,5],
     'value_2': [4,2,5]
     }

df = pd.DataFrame(data)
df['growth'] = 100 * (df['value_2'] / df['value_1'] - 1)

fig = px.treemap(df,  path=['parent', 'variable'], values = 'value_2', color='growth', 
                 color_continuous_scale='plasma')

fig.show()
   

It gives me the beautiful treemap here: Growth treemap

But here is the problem. As you may see on the following screen, the growth for I is 183%: a wrong growth!

However, when calculating manually, a going from 1 to 4, and b from 4 to 2, the growth should be: 1/5 * 300% + 4/5 * -50% = 20% (I goes from 5 to 6).

This is due because the calculus that is made is 4/6 * 300% + 2/6 * -50% = 183%. The method is calculating the weighting average wrt to the new coefficients, and not the former ones as it should in theory.

Is there a way, to have the correct growth when aggregating to a parent class?

Thank you very much for your help, and let me know if I can help further


Solution

  • I couldn't find a way to get the data across as you're trying to depict it. However, I did come up with a workaround.

    This requires the use of plotly.io.

    I want to point out that the nice contrast you have with the colors is lost, when you change the parent to 20% from 183.333333--- essentially that parent is nearly the same color as II, because the values are 20 and 0, whereas 'a' is 300 and the low is only -50.

    Additionally, I added px.Constant so that you don't get a really useless hover label for the root (the black-ish background parent of the parents).

    enter image description here

    enter image description here

    import pandas as pd
    import plotly.express as px
    import plotly.io as pio
    
    fig = px.treemap(df,  path=[px.Constant('Total'), 'parent', 'variable'], 
                     values = 'value_2', color='growth', 
                     color_continuous_scale='plasma')
    

    Now when you use pio, you will create an external file, but this is only way, short of using Jupyter, to add Javascript to your plot. This will automatically open in your browser, like fig.show(), except this will reflect that the parent I has a growth of 20% in the hover data.

    pio.write_html(fig, 'index.html', auto_open = True, div_id = 'thisPlot', 
                   include_mathjax = 'cdn', include_plotlyjs = 'cdn', full_html = True, 
                   post_script = "setTimeout(function() {" +
                   "el = document.getElementById('thisPlot');" +
                   "el.data[0].marker.colors[3] = 20;       /* change the calc value */" + 
                   "Plotly.newPlot(el, el.data, el.layout); /* re-plot it */"
                   "}, 200)")
    

    You may notice that there is el.data[0].marker.colors[3] called to change. That's the parent I.

    Here's all of the data that is captured in el.data[0].marker.colors before this change is made: [300, -50, 0, 183.33333333333334, 0, 100].

    By the way, whenever I go the route of pio.write_html, I always name the file the same thing, so it's always overwriting itself. I'm not interested in the saved file personally, just the outcome of post_script.