Search code examples
rvariablestestingdata-cleaning

Testing the length of two variables, getting the length of both and assigning the shorter one to a variable for downstream work


I need a little help polishing up my code. I am trying to run a wilcox rank sum test and am using the following code to do it:

Program_A <- c(95,78,88,84,89,83,79,85,74,81,77,82)         
Program_B <- c(91,93,83,98,86,95,99,100,94,107,92,102,105,103,87)

n1 <- length(Program_A)
n2 <- length(Program_B)

#make dataframe
Program_data <- data.frame( 
  sections = c(rep("Program_A", n1),
               rep("Program_B", n2)),
  scores = c(Program_A, Program_B)
)  

Program_data

#carry out function
Program_data1 <- Program_data %>%
  mutate(
    score_rank = rank(scores)
  ) %>%
  group_by(sections) %>%
  summarise(test_stat = sum(score_rank))

Program_data1
# sections  test_stat
# <chr>         <dbl>
# 1 Program_A        94
# 2 Program_B       284

Tx <- 94 #using the smallest value
n1 
n2 

z <- (Tx - (n1*(n1+n2+1))/2)/sqrt((n1*n2*(n1+n2+1))/12)
z

This will work as long as Program_A has a shorter length.

However, what I'd like to to now is to find a way to test the lengths of Program_A and Program_B to test which is bigger if the length of the numbers should change.

Ex: Program_A <- c(95,78,88,84) Program_B <- c(91,93,83,98,86,95)

I would like a way to test which variable is longer, get the value of each length and assign in such a way that n1 will always have the value of the shorter length variable, and n2 will always have the value of the longer length variable.

Thanks, DM


Solution

  • We can also do

    l1 <- lengths(list(Program_A, Program_B))
    n1 <- min(l1)
    n2 <- max(l1)