Search code examples
pythonrmatplotlibplotlyplotnine

How to replicate a table from R to python using Matplotlib or Plotly


I am trying to replicate a table, which is currently produced in R, in python implementing plotnine library. I am using facet.grid with two variables (CBRegion and CBIndustry). I have found a similar problem, however, it is also done in R. I applied similar codes as in that link and produced the following table:

Table using R code

I tried to use exactly the same code in python using plotnine library, but the final output is very ugly. This is my python code so far:

myplot = ggplot(data = df_data_bar) + aes(x = "CCR100PDMid %" ,y = "CBSector")+ \
    geom_segment(aes(yend="CBSector", xend=0), colour="black", size = 2) +\
    geom_text(aes(label = "label")) + \
    theme(panel_grid_major_y = element_blank()) + \
    facet_grid('CBIndustry ~ CBRegion',scales="free_y",space="free") + \
    labs(x="", y = "", title=title) + \
    theme_bw() + \
    theme(plot_title = element_text(linespacing=0.8, face="bold", size=20, va="center"), 
        axis_text_x = element_text(colour="#333333",size=12,rotation=0,ha="center",va="top",face="bold"), 
        axis_text_y = element_text(colour="#333333",size=12,rotation=0,ha="right",va="center",face="bold"), 
        axis_title_x = element_blank(), 
        axis_title_y = element_blank(),
        legend_position="none", 
        strip_text_x = element_text(size = 12, face="bold", colour = "black", angle = 0), 
        strip_text_y = element_text(size = 8, face="bold", colour = "black", angle = 0, ha = "left"),
        strip_background_y = element_text(width = 0.2),
        figure_size=(30,20))

The image from plotnine is as follows:

Table using Python code

Comparing Python vs R, we can clearly see that y-axis labels overlap using plotnine. In addition, when we look at Europe and Community groups we can notice that it has the same size box as others with multiple groups which is not necessary. I also tried different aspect ratios, but it has not resolved my problem. In short words, I would like to have the same plot as R produces. It does not need to be produced in plotnine. Alternatives are also welcome. Data from top ten rows is:

{'CBRegion': {0: 'Europe', 1: 'Europe', 2: 'Europe', 3: 'Europe', 4: 'Europe', 5: 'Europe', 6: 'Europe', 7: 'Europe', 8: 'Europe', 9: 'Europe'}, 'CBSector': {0: 'Aerospace & Defense', 1: 'Alternative Energy', 2: 'Automobiles & Parts', 3: 'Banks', 4: 'Beverages', 5: 'Chemicals', 6: 'Colleges & Universities', 7: 'Community Groups', 8: 'Construction & Materials', 9: 'Electricity'}, 'CBIndustry': {0: 'Industrials', 1: 'Oil & Gas', 2: 'Consumer Goods', 3: 'Financials', 4: 'Consumer Goods', 5: 'Basic Materials', 6: 'NPO', 7: 'Community Groups', 8: 'Industrials', 9: 'Utilities'}, 'CCR100PDMid': {0: 0.015545818181818181, 1: 0.003296, 2: 0.012897471223021583, 3: 0.008079544600938968, 4: 0.008716597402597401, 5: 0.0094617476340694, 6: 0.008897475862068967, 7: 0.000821, 8: 0.012205547455295736, 9: 0.0050264210526315784}, 'CCR100PDMid %': {0: 1.554581818181818, 1: 0.3296, 2: 1.2897471223021584, 3: 0.8079544600938968, 4: 0.8716597402597401, 5: 0.9461747634069401, 6: 0.8897475862068966, 7: 0.0821, 8: 1.2205547455295735, 9: 0.5026421052631579}, 'label': {0: '1.6%', 1: '0.3%', 2: '1.3%', 3: '0.8%', 4: '0.9%', 5: '0.9%', 6: '0.9%', 7: '0.1%', 8: '1.2%', 9: '0.5%'}}

If it is necessary, I can upload the entire dataset, but I just read the MRC and it says that I should only include a subset of data. I am new to SO and hope that I included all vital information. I will be grateful for any help. Thank you in advance!


Solution

  • The other issues with colours, overlapping labels, wrapping text etc can be fixed, but unfortunately space = 'free' is not currently supported in plotnine. See documentation here. Unfortunately that's kind of a deal-breaker for your table, sadly. You will need to do in R's ggplot.