Search code examples
pythonpandasplotseabornfacet-grid

Python Pandas Seaborn FacetGrid: use dataframe series' names to set up columns


I am using pandas dataframes to hold some volume calculation results, and trying to configure a seaborn FacetGrid setup to visualize results of 4 different types of volume calculations for a reservoir zone.

I believe I can handle the dataframe part, my problems is with the visualization part: Each different type of volume calculations is loaded in the dataframe as a series. The series name corresponds to the type of volume calculation. I want to create a number of plots then, aligned so that each column of plot corresponds to one series in my dataframe.

Theory (documentation) says this should do it (example from tutorial at https://seaborn.pydata.org/tutorial/axis_grids.html):

import seaborn as sns
import matpltlib.pyplot as plt
tips = sns.load_dataset("tips")
g=sns.FacetGrid(tips, col = "time")

I cannot find the referenced dataset "tips" for download, but I think that is a minor problem. From the code snippet above and after some testing on my own data, I infer that "time" in that dataset refers to the name of one series in the dataframe and that different times would then be different categories or other types of values in that series.

This is not how my dataset is ordered. I have the different types of volume calculations that I would see as individual plots (in columns) represented as series in my dataframe. How do I provide the series name as input to seaborn FacetGrid col= argument?

g = seaborn.FacetGrid(data=volumes_table, col=?????)

I cannot figure out how I can get col=dataframe.series and I cannot find any documented example of that.

here's a setup with some exciting dummy names and dummy values

import os
import pandas
import numpy
import seaborn
import matplotlib.pyplot as plt

#provide some input data, using a small dictionary
volumes_categories = {'zone_numbers': [1, 2, 3, 4],
 'zone_names': ['corona', 'hiv', 'h5n1', 'measles'],
 'grv': [30, 90, 80, 100],
 'nv': [20, 60, 20, 50],
 'pv': [5, 12, 4, 25],
 'hcpv': [4, 6, 1, 20]}

# create the  dataframe
volumes_table = pandas.DataFrame(volumes_categories)

# set up for plotting
seaborn.set(style='ticks')
g= seaborn.FacetGrid(data=volumes_table, col='zone_names')

The above setup generates columns ok, but I cannot get the colums to represent series in my dataframe (the columns when visualizing the dataframe as a table....)

What do I need to do?


Solution

  • Once we imported all requirements:

    import seaborn as sns
    import matplotlib.pyplot as plt
    tips = sns.load_dataset('tips')
    

    The FacetGrid essentially just provides a canvas to draw on. You can then use the map function to "project" plotting functions onto the canvas:

    # Blueprint
    g = sns.FacetGrid(dataframe, col="dataframe.column", row="dataframe.column")
    g = g.map(plotting.function, "dataframe.column")
    
    # Example with the tips dataset
    g = sns.FacetGrid(tips, col="time", row="smoker")
    g = g.map(plt.hist, "total_bill")
    plt.show()
    

    In your case as mentioned above I would also melt the columns first to get a tidy data format and then plot as usual. Changing what to plot however necessary:

    volumes_table = volumes_table.melt(id_vars=['zone_numbers', 'zone_names'])
    g = sns.FacetGrid(data=volumes_table, col='variable')
    g = g.map(plt.scatter, 'zone_numbers', 'value')
    plt.show()