Search code examples
rstatisticsp-valuet-testhypothesis-test

Get p-value with two variables and multiple row names


I wandered if you can help me in measuring the p-value from this simple data.frame. My data frame is called (my_data). By viewing it, you can see similar values I have that I am comparing:

my_data <- read.csv("densityleftOK.csv", stringsAsFactors = FALSE [c(1,2,3),]

      P1    P2   P3  P4  P5   T1  T2  T3  T4  T5  T6
A     1008 1425 869 1205 954  797 722 471 435 628 925
B      550  443 317  477 337  383  54 111  27 239 379
C      483  574 597  375 593  553 249 325 238 354 411

Thus, I would like to get a single pvalue for each row by comparing placebo vs treated samples. If you don't mind, I'd like to get also the standard deviation between either placebo (P) and treated (T).

I appreciate any help. Thanks


Solution

  • You can try something like below, where you pivot the data into long format,group by the ids, introduce a grouping vector("P" or "T") and use tidy on t.test to wrap it up in a table format:

    library(broom)
    library(tidyr)
    library(dplyr)
    library(tibble)
    
    data = read.table(text="P1    P2   P3  P4  P5   T1  T2  T3  T4  T5  T6
    A     1008 1425 869 1205 954  797 722 471 435 628 925
    B      550  443 317  477 337  383  54 111  27 239 379
    C      483  574 597  375 593  553 249 325 238 354 411",header=TRUE,row.names=1)
    
    res = data %>% 
    rownames_to_column("id") %>% 
    pivot_longer(-id) %>% 
    mutate(grp=sub("[0-9]","",name)) %>% 
    group_by(id) %>% 
    do(tidy(t.test(value ~ grp,data=.))) %>%
    select(c(id,estimate,estimate1,estimate2,statistic,p.value)) %>%
    mutate(stderr = estimate/statistic)
    
    # A tibble: 3 x 7
    # Groups:   id [3]
      id    estimate estimate1 estimate2 statistic p.value stderr
      <chr>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>  <dbl>
    1 A         429.     1092.      663       3.40 0.00950  126. 
    2 B         226.      425.      199.      2.89 0.0192    78.2
    3 C         169.      524.      355       2.65 0.0266    64.0
    

    If you don't use packages.. then it's a matter of using apply, and I guess easier to declare the groups up front:

    grp = gsub("[0-9]","",colnames(data))
    
    res = apply(data,1,function(i){
    data.frame(t.test(i~grp)[c("statistic","p.value","stderr")])
    })
    
    res = do.call(rbind,res)
      statistic     p.value    stderr
    A  3.395303 0.009498631 126.40994
    B  2.890838 0.019173060  78.16650
    C  2.646953 0.026608838  63.99812