Search code examples
rlapplychi-squaredgoodness-of-fit

lapply a function to a data frame list and then cbind to corresponding data frame


Problem: Two data frames each containing three columns but different number of rows

>view(archae_pro)
motif obs pred
AAB 1189 760.1757
CDD 1058 249.7147
DDE 771 415.1314
FBB 544 226.3529

>view(archae_end)
motif obs pred
ABG 1044 749.4967
GBC 634 564.5753
AGG 616 568.7375
CGG 504 192.5312
BTT 404 200.4589

I want to perform chi-square goodness-of-fit test. Also, calculate standardised residuals and column-bind them to the corresponding data frames. What I tried follows:

df.list <- list (
  df_archae_pro, 
  df_archae_end,
)+

prop <- lapply(df.list, function (x) cbind(x$pred/sum(x$pred)))
chisquare <- lapply(df.list, function(x) chisq.test (x$obs, p=prop))  

Rstudio throws up an error

Error in chisq.test(x$obs, p = prop) : 
'x' and 'p' must have the same number of elements

My two-pence on the error: chisq.test somehow does not read the "prop" corresponding to the correct data.frame?!

I have just started learning rstudio a few days ago so do excuse any obvious mistakes.

I would also appreciate any help in calculating the standardized residuals and column-binding them to the data frames.

This is a follow up question. How do I plot geometric bar (motifs versus stdres) and facet wrap 1 column and multiple rows corresponding to each data. I tried the following

myplots<-lapply(df.list,function(x)
  p<-ggplot(x,aes(x=motif,y=stdres)) +
    geom_bar(stat="identity") +
    facet_wrap(x)
)
myplots

But it throws up an error,

Error in `[.data.frame`(.subset2(df, col)[1], NA) : 
  undefined columns selected

What am I doing wrong?


Solution

  • Simply, add a first argument to cbind with named argument for new column, prop, on second argument while assigning result back to df.list since you are adding a new column to each data frame.

    Then, in next call add an object qualifier, x$, to prop to reference column in test:

    df.list <- lapply(df.list, function(x) 
        cbind(x, prop=x$pred/sum(x$pred))
    ) 
    
    chisquare <- lapply(df.list, function(x)
        chisq.test(x$obs, p=x$prop)
    ) 
    

    To assign results of test, cbind the extracted values:

    df.list <- lapply(df.list, function(df) {
        results <- chisq.test(df$obs, p=df$prop)
    
        cbind(
            df, 
            stat = results$statistic, 
            pval = results$p.value,
            stdres = results$stdres
        )
    })