Search code examples
pythonggplot2plotnine

How to do facet_wrap/facet_grid properly in plotnine showing relevant subcategpries only in python?


I have data with two columns with categories and I am trying to create a split/facets based on parent category that will contain sub categories. Below is something that I have tried:

import pandas as pd
import plotnine as p9
from plotnine import *

# Create a sample dataset
new_data = {
    'date': pd.date_range('2022-01-01', periods=8, freq="ME"),
    'parent_category': ['Electronics', 'Electronics', 'Fashion', 'Fashion', 'Home Goods', 'Electronics', 'Fashion','Electronics'],
    'child_category': ['Smartphones', 'Laptops', 'Shirts', 'Pants', 'Kitchenware','Laptops', 'Shirts', 'Smartphones']
}

# Create the DataFrame
new_data = pd.DataFrame(new_data)
(ggplot(new_data
        ,aes(x="date", y="child_category")) +
        geom_line(size = 8, color = "pink") + # #edece3
        geom_point(size = 6, color = "grey") +
        facet_wrap("parent_category",ncol=1) + # 
        theme_538() +
        theme(axis_text_x=element_text(angle=45, hjust=1),
              panel_grid_major = element_blank(),
              figure_size=(8, 6)
              )
        )

enter image description here

Expected output:

Electronics should show Smartphone and Laptop only and not the fashion items. In the above plot all the child categories have got repeated in all facets but I would like to have only relevant child_categories in parent_Category facets

If it is not possible using facets then how can it be done ... subplots or any other method ?

Appreciate any suggestions !!


Solution

  • The issue in your code is that geom_line and geom_point are not grouping the data correctly. Although it is not strictly required, you could use the group aesthetic to ensure that lines and points are drawn separately for each child_category within each parent_category. In addition, you need to add the option scales="free_y" to the facet_wrap. I have modified your code to fix the issue and now it should plot exactly what you want:

    import pandas as pd
    import plotnine as p9
    from plotnine import *
    
    new_data = {
        'date': pd.date_range('2022-01-01', periods=8, freq="ME"),
        'parent_category': ['Electronics', 'Electronics', 'Fashion', 'Fashion', 'Home Goods', 'Electronics', 'Fashion','Electronics'],
        'child_category': ['Smartphones', 'Laptops', 'Shirts', 'Pants', 'Kitchenware','Laptops', 'Shirts', 'Smartphones']
    }
    
    new_data = pd.DataFrame(new_data)
    
    (ggplot(new_data, aes(x="date", y="child_category", group="child_category")) +
        geom_line(size=1, color="pink") +
        geom_point(size=3, color="grey") +
        facet_wrap("parent_category", ncol=1, scales="free_y") +
        theme_538() +
        theme(axis_text_x=element_text(angle=45, hjust=1),
              panel_grid_major=element_blank(),
              figure_size=(8, 6))
    )
    

    With the previous code you should get a plot like the following one: enter image description here