Search code examples
seabornalpha-transparency

Seaborn PairGrid: pairplot two data set with different transparency


I'd like to make a PairGrid plot with the seaborn library.

I have two classed data: a training set and one-target point.

I'd like to plot the one-target point as opaque, however, the samples in the training set should be transparent.

And I'd like to plot the one-target point also in lower cells.

Here is my code and image:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data = pd.read_csv("data.csv")

g = sns.PairGrid(data, hue='type')
g.map_upper(sns.scatterplot, alpha=0.2, palette="husl")
g.map_lower(sns.kdeplot, lw=3, palette="husl")
g.map_diag(sns.kdeplot, lw=3, palette="husl")
g.add_legend()

plt.show()

And the data.csv is like belows:

         logP    tPSA       QED  HBA  HBD          type
0    -2.50000  200.00  0.300000    8    1      Target 1
1     1.68070   87.31  0.896898    3    2  Training set
2     3.72930   44.12  0.862259    4    0  Training set
3     2.29702   91.68  0.701022    6    3  Training set
4    -2.21310  102.28  0.646083    8    2  Training set

enter image description here


Solution

  • You can reassign the dataframe used after partial plotting. E.g. g.data = data[data['type'] == 'Target 1']. So, you can first plot the training dataset, change g.data and then plot the target with other parameters.

    The following example supposes the first row of the iris dataset is used as training data. A custom legend is added (this might provoke a warning that should be ignored).

    import matplotlib.pyplot as plt
    from matplotlib.lines import Line2D
    import seaborn as sns
    
    iris = sns.load_dataset('iris')
    
    g = sns.PairGrid(iris)
    color_for_trainingset = 'paleturquoise'
    # color_for_trainingset = sns.color_palette('husl', 2) [-1] # this is the color from the question
    g.map_upper(sns.scatterplot, alpha=0.2, color=color_for_trainingset)
    g.map_lower(sns.kdeplot, color=color_for_trainingset)
    g.map_diag(sns.kdeplot, lw=3, color=color_for_trainingset)
    
    g.data = iris.iloc[:1]
    # g.data = data[data['type'] == 'Target 1']
    g.map_upper(sns.scatterplot, alpha=1, color='red')
    g.map_lower(sns.scatterplot, alpha=1, color='red', zorder=3)
    
    handles = [Line2D([], [], color='red', ls='', marker='o', label='target'),
               Line2D([], [], color=color_for_trainingset, lw=3, label='training set')]
    g.add_legend(handles=handles)
    
    plt.show()
    

    sns.pairplot with changed dataset