Search code examples
rsortingrandom

'Random' Sorting with a condition in R for Psychology Research


I have Valence Category for word stimuli in my psychology experiment.

1 = Negative, 2 = Neutral, 3 = Positive

I need to sort the thousands of stimuli with a pseudo-randomised condition.

Val_Category cannot have more than 2 of the same valence stimuli in a row i.e. no more than 2x negative stimuli in a row.

for example - 2, 2, 2 = not acceptable

2, 2, 1 = ok

I can't sequence the data i.e. decide the whole experiment will be 1,3,2,3,1,3,2,3,2,2,1 because I'm not allowed to have a pattern.

I tried various packages like dylpr, sample, order, sort and nothing so far solves the problem.


Solution

  • I think there's a thousand ways to do this, none of which are probably very pretty. I wrote a small function that takes care of the ordering. It's a bit hacky, but it appeared to work for what I tried.

    To explain what I did, the function works as follows:

    1. Take the vector of valences and samples from it.
    2. If sequences are found that are larger than the desired length, then, (for each such sequence), take the last value of that sequence at places it "somewhere else".
    3. Check if the problem is solved. If so, return the reordered vector. If not, then go back to 2.

    # some vector of valences
    val <- rep(1:3,each=50)
    
    pseudoRandomize <- function(x, n){
    
      # take an initial sample
      out <- sample(val)
      # check if the sample is "bad" (containing sequences longer than n)
      bad.seq <- any(rle(out)$lengths > n)
      # length of the whole sample
      l0 <- length(out)
    
      while(bad.seq){
        # get lengths of all subsequences
        l1 <- rle(out)$lengths
        # find the bad ones
        ind <- l1 > n
        # take the last value of each bad sequence, and...
        for(i in cumsum(l1)[ind]){
          # take it out of the original sample
          tmp <- out[-i]
          # pick new position at random
          pos <- sample(2:(l0-2),1)
          # put the value back into the sample at the new position
          out <- c(tmp[1:(pos-1)],out[i],tmp[pos:(l0-1)])
        }
        # check if bad sequences (still) exist
        # if TRUE, then 'while' continues; if FALSE, then it doesn't
        bad.seq <- any(rle(out)$lengths > n)
      }
      # return the reordered sequence
      out
    
    }
    

    Example:

    The function may be used on a vector with or without names. If the vector was named, then these names will still be present on the pseudo-randomized vector.

    # simple unnamed vector
    val <- rep(1:3,each=5)
    pseudoRandomize(val, 2)
    
    # gives:
    # [1] 1 3 2 1 2 3 3 2 1 2 1 3 3 1 2
    
    # when names assigned to the vector
    names(val) <- 1:length(val)
    pseudoRandomize(val, 2)
    
    # gives (first row shows the names):
    #  1 13  9  7  3 11 15  8 10  5 12 14  6  4  2 
    #  1  3  2  2  1  3  3  2  2  1  3  3  2  1  1 
    

    This property can be used for randomizing a whole data frame. To achieve that, the "valence" vector is taken out of the data frame, and names are assigned to it either by row index (1:nrow(dat)) or by row names (rownames(dat)).

    # reorder a data.frame using a named vector
    dat <- data.frame(val=rep(1:3,each=5), stim=rep(letters[1:5],3))
    val <- dat$val
    names(val) <- 1:nrow(dat)
    
    new.val <- pseudoRandomize(val, 2)
    new.dat <- dat[as.integer(names(new.val)),]
    
    # gives:
    #    val stim
    # 5    1    e
    # 2    1    b
    # 9    2    d
    # 6    2    a
    # 3    1    c
    # 15   3    e
    # ...