Search code examples
pythonpandasdataframeplotly-express

Generating a plotly.express.bar chart for two population categories + extending to animation frame


I am freshman in all things Python, pandas and plotly. I am doing a personal project on population of individual countries and I would like to make a plotly.express + dash graph environment, but I have run into trouble already at the beginning. :D

Initially, I was able to get a nice table through some pivoting and resetting of indices, like here: Data table

The table has country, sex (male, female, both) and year (1961-2019), followed by population per each age range (5-9, 10-14, ..., Total). I know I can "extract" individual country and year with as simple line of code as:

country, year = "Andorra", 2019
df_country = df[ (df["geo"] == country) &  (df['TIME_PERIOD'] == year) ]

to get something like this: Data filtered per country and year

And then I tried to get a px.bar chart based on the upper table, so that I get two horizontal bars (male, female) for each age range in the same graph. I wrote following code:

px.bar(data_frame=df_country, x=df_country.loc[:, '<5':'75<'].columns, color='sex', height=300)

and got this: Weird stats for Andorra?

So, the bars are divided by sex categories (third one includes 'male+female'), but rather than having age ranges on the y-axis, each bar as been sliced into a portion which corresponds to numerical values from the table.

I have a feeling it has to do something with those "wide" and "long" data formats, but I could be wrong.

What I would like to get from the upper table is just to select one country (say Andorra), and then that it shows two bars (male, female) per each age range on y-axis. Something like this: One-bar version.

I'd also like to include the Andorra dataframe in such a way that I can use TIME_PERIOD columnd (which is basically years) for that nice slider which is usually defined by animation_frame variable.

I suppose the code would look something like:

px.bar(data_frame=df_country, x=df_country.loc[:, '<5':'75<'], color='sex', animation_frame='TIME_PERIOD')

The idea is to, eventually, make a selection of country through dash, but px.bar would generate these 'male,female per age range' bars and would have a slider for years. I am avoiding the classical population pyramid for now.

Thanks in advance.


Solution

  • Since the data presentation is an image, the code was created using only the data available from the image. The female and male populations are the same. As you are aware, the wide format is not compatible, so we convert the data to long format and create the graph. Color coding is gender, y-axis is age grouping, x-axis is population, then change to grouping.

    import pandas as pd
    import io
    
    data = '''
    id geo sex TIME_PER100 "<5" "5-9" "10-14" "15-19" "20-24" "25-29" "30-34" "35-39" "40-34" "45-49" "50-54" "55-59"
    0 Andorra Female 1986 1099.0 1496.0 1704.0 1530.0 1993.0 2298.0 2016.0 1764.0 1319.0 1081.0 1024.0 1003.0
    1 Andorra Female 1987 1153.0 1588.0 1759.0 1619.0 2153.0 2387.0 2190.0 1828.0 1472.0 1117.0 1079.0 1025.0
    2 Andorra Female 1988 1129.0 1574.0 1772.0 1685.0 2057.0 2527.0 2257.0 1883.0 1565.0 1163.0 1117.0 1058.0
    3 Andorra Female 1989 1099.0 1606.0 1795.0 1747.0 2070.0 2586.0 2406.0 1984.0 1690.0 1329.0 1120.0 1149.0
    4 Andorra Female 1990 1182.0 1562.0 1712.0 1798.0 1997.0 2558.0 2417.0 2019.0 1759.0 1450.0 1135.0 1134.0
    5 Andorra Male 1986 1099.0 1496.0 1704.0 1530.0 1993.0 2298.0 2016.0 1764.0 1319.0 1081.0 1024.0 1003.0
    6 Andorra Male 1987 1153.0 1588.0 1759.0 1619.0 2153.0 2387.0 2190.0 1828.0 1472.0 1117.0 1079.0 1025.0
    7 Andorra Male 1988 1129.0 1574.0 1772.0 1685.0 2057.0 2527.0 2257.0 1883.0 1565.0 1163.0 1117.0 1058.0
    8 Andorra Male 1989 1099.0 1606.0 1795.0 1747.0 2070.0 2586.0 2406.0 1984.0 1690.0 1329.0 1120.0 1149.0
    9 Andorra Male 1990 1182.0 1562.0 1712.0 1798.0 1997.0 2558.0 2417.0 2019.0 1759.0 1450.0 1135.0 1134.0
    '''
    
    df = pd.read_csv(io.StringIO(data), delim_whitespace=True, index_col=0)
    
    # wide to long
    df_long = df.melt(id_vars=['geo','sex','TIME_PER100'], value_vars=df.columns[3:], var_name='age', value_name='poplation')
    
    df_long.head()
        geo sex TIME_PER100 age poplation
    0   Andorra Female  1986    <5  1099.0
    1   Andorra Female  1987    <5  1153.0
    2   Andorra Female  1988    <5  1129.0
    3   Andorra Female  1989    <5  1099.0
    4   Andorra Female  1990    <5  1182.0
    
    import plotly.express as px
    
    fig = px.bar(df_long, x="poplation", y="age", orientation='h', color='sex', animation_frame='TIME_PER100')
    fig.update_layout(barmode='group')
    
    fig.show()
    

    enter image description here