Search code examples
pythonmatplotlibstatisticsseabornvisualization

Customizing p-value thresholds for "star" text format in statannotations


The statannotations package provides visualization annotation on the level of statistical significance for pairs of data in plots (in seaborn boxplot or strip plot, for example). These annotation can be in "star" text format, where one or more stars appears on top of the bar between pairs of data: example figure from statsannotations' example.ipynb.

Is there any way to customize the thresholds for stars? I want 0.0001 to be the threshold for the first significance threshold instead of 0.05, and 0.00001 for two stars **, and 0.000001 for three stars ***.

The example figure was generated from example codes from statsannotations' github page:

from statannotations.Annotator import Annotator
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = sns.load_dataset("tips")
x = "day"
y = "total_bill"
order = ['Sun', 'Thur', 'Fri', 'Sat']
ax = sns.boxplot(data=df, x=x, y=y, order=order)
annot = Annotator(ax, [("Thur", "Fri"), ("Thur", "Sat"), ("Fri", "Sun")], data=df, x=x, y=y, order=order)
annot.configure(test='Mann-Whitney', text_format='star', loc='outside', verbose=2)
annot.apply_test()
ax, test_results = annot.annotate()
plt.savefig('example_non-hue_outside.png', dpi=300, bbox_inches='tight')

With verbose set to 2, this would also tell us the thresholds used for determining how many stars appear above the bars:

p-value annotation legend:
      ns: p <= 1.00e+00
       *: 1.00e-02 < p <= 5.00e-02
      **: 1.00e-03 < p <= 1.00e-02
     ***: 1.00e-04 < p <= 1.00e-03
    ****: p <= 1.00e-04

I want to feed something like a dictionary of p-value threshold: number of stars to Annotator, but I don't know to what parameter should I feed to.


Solution

  • In their repository, specifically inside file [Annotator.py][1]:,we have self._pvalue_format = PValueFormat(). That implies we can change the same. The PValueFormat() class, which can be found here, has the following configurable parameters:

    CONFIGURABLE_PARAMETERS = [
        'correction_format',
        'fontsize',
        'pvalue_format_string',
        'simple_format_string',
        'text_format',
        'pvalue_thresholds',
        'show_test_name'
    ]
    

    For completeness, here is the modified version of your code and the new result with two lines showing the before and after values for the pvalues. Also, the image changes accordingly.

    # ! pip install statannotations
    from smartprint import smartprint as sprint
    from statannotations.Annotator import Annotator
    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    
    df = sns.load_dataset("tips")
    x = "day"
    y = "total_bill"
    order = ['Sun', 'Thur', 'Fri', 'Sat']
    ax = sns.boxplot(data=df, x=x, y=y, order=order)
    annot = Annotator(ax, [("Thur", "Fri"), ("Thur", "Sat"), ("Fri", "Sun")], data=df, x=x, y=y, order=order)
    
    print ("Before hardcoding pvalue thresholds ")
    sprint (annot.get_configuration()["pvalue_format"])
    
    
    annot.configure(test='Mann-Whitney', text_format='star', loc='outside', verbose=2)
    annot._pvalue_format.pvalue_thresholds =  [[0.01, '****'], [0.03, '***'], [0.2, '**'], [0.6, '*'], [1, 'ns']]
    annot.apply_test()
    ax, test_results = annot.annotate()
    plt.savefig('example_non-hue_outside.png', dpi=300, bbox_inches='tight')
    
    print ("After hardcoding pvalue thresholds ")
    sprint (annot.get_configuration()["pvalue_format"])
    

    Output:

    Before hardcoding pvalue thresholds 
    Dict: annot.get_configuration()["pvalue_format"]
    Key: Value
    
    {'correction_format': '{star} ({suffix})',
     'fontsize': 'medium',
     'pvalue_format_string': '{:.3e}',
     'pvalue_thresholds': [[0.0001, '****'],
                           [0.001, '***'],
                           [0.01, '**'],
                           [0.05, '*'],
                           [1, 'ns']],
     'show_test_name': True,
     'simple_format_string': '{:.2f}',
     'text_format': 'star'}
    
    p-value annotation legend:
          ns: p <= 1.00e+00
           *: 2.00e-01 < p <= 6.00e-01
          **: 3.00e-02 < p <= 2.00e-01
         ***: 1.00e-02 < p <= 3.00e-02
        ****: p <= 1.00e-02
    
    Thur vs. Fri: Mann-Whitney-Wilcoxon test two-sided, P_val:6.477e-01 U_stat=6.305e+02
    Thur vs. Sat: Mann-Whitney-Wilcoxon test two-sided, P_val:4.690e-02 U_stat=2.180e+03
    Sun vs. Fri: Mann-Whitney-Wilcoxon test two-sided, P_val:2.680e-02 U_stat=9.605e+02
    After hardcoding pvalue thresholds 
    Dict: annot.get_configuration()["pvalue_format"]
    Key: Value
    
    {'correction_format': '{star} ({suffix})',
     'fontsize': 'medium',
     'pvalue_format_string': '{:.3e}',
     'pvalue_thresholds': [[0.01, '****'],
                           [0.03, '***'],
                           [0.2, '**'],
                           [0.6, '*'],
                           [1, 'ns']],
     'show_test_name': True,
     'simple_format_string': '{:.2f}',
     'text_format': 'star'}
    

    Image:

    Edit: Based on user: Bonlenfum's comment, changing the thresholds can also be achieved by simply appending the key-value when calling .configure, as shown below:

    annot.configure(test='Mann-Whitney', text_format='star', loc='outside',\
    verbose=2, pvalue_thresholds=[[0.01, '****'], \
    [0.03, '***'], [0.2, '**'], [0.6, '*'], [1, 'ns']])