Search code examples
rloopsif-statementnormal-distribution

condition in loop in R


I have a relative simple question for which I was not able to apply solutions I have found on the internet. Let's say we have:

set.seed(20)

data <- data.frame(month = rep(month.name, 25), 
a = rnorm(300, 0, 1), b = runif(300, 0, 7.2))

I want to calculate using a loop the f-test for variance between columns a and b for each month in month. This I done by using:

# create some empty vectors to fill in later
pval <- as.double()
ftest <- as.double()
month <- as.character()

# looping through the months

for (i in unique(data$month)){
  print(i)
  # sh.1 <- shapiro.test(data$a[data$month==i])
  # sh.1[2] > 0.05 # apply log if it's smaller than 0.05
  # sh.2 <- shapiro.test(data$b[data$month==i])
  # sh.2[2] > 0.05 # apply log if it's smaller than 0.05
  var.t <- var.test(data$a[data$month==i], data$b[data$month==i])
  f <- round(var.t[[1]],2)
  p <- round(var.t$p.value,2)
  ftest <- append(ftest, f)
  pval <- append(pval, p)
  month <- append(month, i)
}

However, as far as I know, f-test is very sensitive to normal distribution. Therefore, I am planning to use a condition into loop where in case that p-value of shapiro test is smaller than 0.05 a log transformation for the data will be required; then it will be used into f-test.

Normally, I would to this with an ifelse condition but I am not very sure how to use it here. Any help here please?


Solution

  • I believe the code below does what you want. It uses *apply loops, not for loops in order to make the code more readable (I think).

    First I will recreate the data and make sure column a is all positive.

    set.seed(20)
    
    data <- data.frame(month = rep(month.name, 25), 
                       a = rnorm(300, 0, 1), b = runif(300, 0, 7.2))
    
    data$a <- abs(data$a)
    

    Now, instead of looping through unique values of month, I split the data.frame by that variable. Like this each of the df's in the resulting list sp already is a df of all rows of each month.

    sp <- split(data, data$month)
    sp <- sp[order(order(month.name))]
    

    It's here that the data are log transformed if necessary.

    sp <- lapply(sp, function(DF){
      if(shapiro.test(DF[["a"]])$p.value < 0.05) DF[["a"]] <- log(DF[["a"]])
      if(shapiro.test(DF[["b"]])$p.value < 0.05) DF[["b"]] <- log(DF[["b"]])
      DF
    })
    

    And lapply the test you want, var.test, to all of these data.frames.

    vartest_list <- lapply(sp, function(DF){
      var.t <- var.test(DF[["a"]], DF[["b"]])
      list(f = var.t[[1]], 
           p.value = var.t$p.value, 
           month = as.character(DF[["month"]][1]))
    })
    

    Finally, it is a simple matter of applying the extraction function [[ to the tests' results. This works because hypothesis tests functions in R return objects of class "htest" that are nothing else but lists. The last of the extraction loops is commented out.

    ftest <- sapply(vartest_list, '[[', 'f')
    pval <- sapply(vartest_list, '[[', 'p.value')
    #month <- sapply(vartest_list, '[[', 'month')