r statistics permutation statistical-test

How to fix my permutation part of my R code?

I have attached my code with the problem that I am having issues on below:

library(dplyr)
library(perm)
data(iris)

# Create subset to only see setosa and virginica species
iris_subset <- iris %>%
  filter(Species %in% c("setosa", "virginica"))

# Define test statistic
difference <- diff(mean(iris_subset$Sepal.Length[iris_subset$Species == "setosa"]),
                   mean(iris_subset$Sepal.Length[iris_subset$Species == "virginica"]))

# Define function
permutation_test <- function(data) {
  permutated_species <- sample(data$Species)
  permutated_diff <- diff(mean(data$Sepal.Length[permutated_species == "setosa"]),
                          mean(data$Sepal.Length[permutated_species == "virginica"]))
  return(permutated_diff)
}

# Permutation test
set.seed(13)
permutation_results <- permTS(iris_subset$Sepal.Length, group = iris_subset$Species, testStatistic = perm_test, nrepl = 10000)

# Calculate p-value
p_value <- permutation_results$p.value

# Print p-value
cat("P-value:", p_value, "\n")

I keep getting a variety of errors in my code all of which surround the permutation test portion and I am unsure of how to fix it so I can get my p-value.

I put my code into ChatGPT to see if it could fix my error, the primary one being:

"Error in permTS.default(iris_subset$Sepal.Length, group = iris_subset$Species, : argument "y" is missing, with no default"

I added in a y because that was what was missing as iris_subset$Species and I got an error that both x and y needed to be numeric. I have also tried using library(coin) and library(exactRankTest) and recoding my original code to fit the packages and there were still errors.

Solution

You are complicating where the package already makes simple. The two sample test is a permutation test for the difference in the means, there is no need to define a function to do what permTS already does. From help("permTS"):

For permTS the test statistic is equivalent to the mean of one group minus the mean of the other group.

library(perm)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data(iris)

# Create subset to only see setosa and virginica species
iris_subset <- iris %>%
  filter(Species %in% c("setosa", "virginica"))

# set conf level, number of Monte Carlo replications, etc
permCntl <- permControl(cm = NULL, nmc = 10^4, seed = 1234321, digits = 12,
                        p.conf.level = 0.95, setSEED = TRUE, tsmethod = "central")

# test default is "two.sided"
permutation_results <- permTS(Sepal.Length ~ Species, data = iris_subset, control = permCntl)

# Calculate p-value
p_value <- permutation_results$p.value
# Print p-value
cat("P-value:", p_value, "\n")
#> P-value: 5.882895e-17

^{Created on 2024-04-09 with reprex v2.1.0}