Search code examples
rstatisticspermutationstatistical-test

How to fix my permutation part of my R code?


I have attached my code with the problem that I am having issues on below:

library(dplyr)
library(perm)
data(iris)

# Create subset to only see setosa and virginica species
iris_subset <- iris %>%
  filter(Species %in% c("setosa", "virginica"))

# Define test statistic
difference <- diff(mean(iris_subset$Sepal.Length[iris_subset$Species == "setosa"]),
                   mean(iris_subset$Sepal.Length[iris_subset$Species == "virginica"]))

# Define function
permutation_test <- function(data) {
  permutated_species <- sample(data$Species)
  permutated_diff <- diff(mean(data$Sepal.Length[permutated_species == "setosa"]),
                          mean(data$Sepal.Length[permutated_species == "virginica"]))
  return(permutated_diff)
}

# Permutation test
set.seed(13)
permutation_results <- permTS(iris_subset$Sepal.Length, group = iris_subset$Species, testStatistic = perm_test, nrepl = 10000)

# Calculate p-value
p_value <- permutation_results$p.value

# Print p-value
cat("P-value:", p_value, "\n")

I keep getting a variety of errors in my code all of which surround the permutation test portion and I am unsure of how to fix it so I can get my p-value.

I put my code into ChatGPT to see if it could fix my error, the primary one being:

"Error in permTS.default(iris_subset$Sepal.Length, group = iris_subset$Species, : argument "y" is missing, with no default"

I added in a y because that was what was missing as iris_subset$Species and I got an error that both x and y needed to be numeric. I have also tried using library(coin) and library(exactRankTest) and recoding my original code to fit the packages and there were still errors.


Solution

  • You are complicating where the package already makes simple. The two sample test is a permutation test for the difference in the means, there is no need to define a function to do what permTS already does. From help("permTS"):

    For permTS the test statistic is equivalent to the mean of one group minus the mean of the other group.

    library(perm)
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    
    data(iris)
    
    # Create subset to only see setosa and virginica species
    iris_subset <- iris %>%
      filter(Species %in% c("setosa", "virginica"))
    
    # set conf level, number of Monte Carlo replications, etc
    permCntl <- permControl(cm = NULL, nmc = 10^4, seed = 1234321, digits = 12,
                            p.conf.level = 0.95, setSEED = TRUE, tsmethod = "central")
    
    # test default is "two.sided"
    permutation_results <- permTS(Sepal.Length ~ Species, data = iris_subset, control = permCntl)
    
    # Calculate p-value
    p_value <- permutation_results$p.value
    # Print p-value
    cat("P-value:", p_value, "\n")
    #> P-value: 5.882895e-17
    

    Created on 2024-04-09 with reprex v2.1.0