Search code examples
rboxplotp-value

Box plot with different variable length columns b/w two data frames


I have two dataframes. Their lengths differ.

df1:
 Samples   Number
 A9GS        73
 A9GY        142
 ASNO        327
 A5UE        131

df2:
 Samples   Number
 AUFS        107
 A9JY        42
 AKNO        32
 A9FE        111
 A9GY        12
 ADNO        37
 A2KE        451

I have done wilcoxon test on this.

wilcox.test(df1$Number,df2$Number, correct=FALSE)

This gave me p-value. And to visualise this I used box plot function and gave an error like following.

boxplot(df1$Number ~ df2$Number, xlim=c(0.5,3))
Error in model.frame.default(formula = df1$Number ~ df2$Number) : 
  variable lengths differ (found for 'df2$Number')

Can anyone correct my mistake and also tell me how to get p-value on the plot. Thank you


Solution

  • You would only be able to use the formula if there were a 1-1 pairing of those to dataframes (with the RHS usually a group variable rather than a numeric one), which clearly there is not. You need to use the list delivery system rather than the formula one. I'll see if I can construct a working example.

    The plot is achieved with:

    png(); boxplot( list(df1_N=df1$Number, df2_N = df2$Number) ); dev.off()
    

    enter image description here

    And annotation can be done with the text function which accepts a ?plotmath argument typically constructed with bquote.

    text( 1.5, 400, 
       label=bquote( 
           p~value == .(wilcox.test(df1$Number,df2$Number, correct=FALSE)$p.value)
        ) )
    

    If you wanted to round the p-value use round( ... ) around the expression inside the .( )-function