Search code examples
rdataframecriteriadata-wranglingcolumnsorting

Create column based on multiple column and row criteria in R


I am after some help with a bit of complex dataframe management/column creation.

I have a data frame that looks something like this:

dat <- data.frame(Target = sort(rep(c("1", "2", "3", "4"), times = 5)),
                  targ1frames_x = c(0.42, 0.46, 0.50, 0.60, 0.86, 0.32, 0.56, 
                                    0.89, 0.98, 0.86, 0.57, 0.79, 0.52, 0.62, 
                                    0.55, 0.12, 0.35, 0.50, 0.48, 0.45),
                  targ1frames_y = c(0.69, 0.63, 0.74, 0.81, 0.12, 0.67, 0.54, 
                                    0.30, 0.25, 0.18, 0.63, 0.63, 0.79, 0.81, 
                                    0.96, 0.90, 0.75, 0.74, 0.81, 0.76),
                  targ1frames_time = rep(c(0.00006, 0.18, 0.27, 0.35, 0.43), times = 4))

     Target targ1frames_x targ1frames_y targ1frames_time
1       1          0.42          0.69          0.00006
2       1          0.46          0.63          0.18000
3       1          0.50          0.74          0.27000
4       1          0.60          0.81          0.35000
5       1          0.86          0.12          0.43000
6       2          0.32          0.67          0.00006
7       2          0.56          0.54          0.18000
8       2          0.89          0.30          0.27000
9       2          0.98          0.25          0.35000
10      2          0.86          0.18          0.43000
11      3          0.57          0.63          0.00006
12      3          0.79          0.63          0.18000
13      3          0.52          0.79          0.27000
14      3          0.62          0.81          0.35000
15      3          0.55          0.96          0.43000
16      4          0.12          0.90          0.00006
17      4          0.35          0.75          0.18000
18      4          0.50          0.74          0.27000
19      4          0.48          0.81          0.35000
20      4          0.45          0.76          0.43000

This is from an experiment in which participants had to click on-screen targets with a computer mouse. The Target column indicates which target was clicked, targ1frames_x shows the x coordinates of the cursor on each frame; targ1frames_y shows the y coordinates on each frame; and targ1frames_time indicates the time of each frame change.

What I want to work out is the time of the frame change when the mouse coordinates from either x or y columns changed between the first value for each target (i.e. the starting cursor position) and the subsequent rows by >= 0.2. I want to get the single time for each target when this occurs into a new column/data frame. I.e., I want to end up with:

    Target targ1_initTime
1      1           0.43000
2      2           0.18000
3      3           0.18000
4      4           0.18000

Note that for the simplicity of providing a reproducible example, each target here has the same time values, which is not the case in my actual dataset. I am trying to write code that can be flexible to produce the time at which the x or y coordinate changes from the starting value (within the x or y coordinate columns) by +-0.2 for each target.

I hope this makes sense, I would be very grateful for any advice - if anything needs further clarification, please let me know in the comments.


Solution

  • I realised I had not done an amazing job of explaining what I was trying to do, so these solutions did not entirely work, though they gave me the building blocks to get there! Thanks so much for all of your help. Here is what I did in the end:

    #Change columns to correct class
    dat2 <-dat %>%
      mutate(targ1frames_x = as.numeric(targ1frames_x), targ1frames_y = as.numeric(targ1frames_y),
             targ1frames_time = as.numeric(targ1frames_time))
    
    #Extract the first x and y coordinates for each target
    
    sPosDat <- dat2 %>%
      group_by(Target) %>%
      select(Target, targ1frames_x, targ1frames_y) %>%
      mutate(targ1Start_x = targ1frames_x, targ1Start_y = targ1frames_y) %>% #rename columns
      slice(1)
    
    #Remove extra columns to keep things tidy
    sPosDat <- sPosDat[-c(2, 3)]
    
    #Join the sPosDat onto the main data
    dat3 <- left_join(dat2, sPosDat, by = "Target")
    
    #Now work out the difference between the start position coords and coords on each frame  
    dat3 <- dat3 %>% 
      mutate(posDiff_x = abs(targ1frames_x - targ1Start_x)) %>% 
      mutate(posDiff_y = abs(targ1frames_y - targ1Start_y))
    
    #Now we need to select the rows in the time column which correspond to the x OR y
    #coordinate difference of > 0.2
    
    dat4 <- dat3 %>%
      group_by(Target) %>%
      filter(posDiff_x > 0.2 | posDiff_y > 0.2) %>%
      slice(1) %>%
      select(Target, targ1frames_time)
    

    I'm sure there is a more elegant way to achieve this, but the code seems to work all the same.