I have a big dataset (More than 200 thousand lines and a thousand of trials) with four columns with follow information:
I need to know which Lines (Identified with "L" + Number") stay between the biggest and smaller value of checks, in each trial. For example, in trial 10001, I have 5 checks, the biggest is Check 3 (Reaction = 56) and the smaller Check 1 (Reaction = 50). I need to know which lines stay between these two values i.e. (bigger or equal 50 and small or equal 56). A small dataset was attached for an example. In this, in trial 10001 all lines should be select, except "L3". On the other hand, in trial 10002 the smaller Check is Check-1 (50) and the biggest is Check-7 (60), thus, only L3 and L9 should be selected. The Checks and the Reaction Values change in each trial, thus, I need to extract the lines with the values between the biggest and smaller checks in each Trial.
Trials Is.Check ID Reaction
10001 1 Check-1 50
10001 0 L1 50
10001 0 L2 50
10001 0 L10 50
10001 0 L9 50
10001 0 L6 50
10001 0 L3 48
10001 0 L4 50
10001 0 L8 50
10001 1 Check-5 52
10001 0 L7 50
10001 1 Check-2 52
10001 1 Check-4 54
10001 0 L5 52
10001 1 Check-3 56
10002 1 Check-1 50
10002 0 L1 48
10002 0 L2 48
10002 0 L3 54
10002 0 L4 64
10002 0 L5 64
10002 0 L6 62
10002 0 L7 62
10002 0 L8 70
10002 0 L9 52
10002 1 Check-7 60
10002 1 Check-2 54
10002 1 Check-6 56
10002 1 Check-3 54
As a first step, you want to split the data.frame, per Trial:
splitted <- split(dataset, dataset$Trials)
Then, for each of those elements, we select all non-check elements, that obey 2 conditions: >=
the smallest check, and <=
the largest one. And we need to do this for each of the sub-data.frames in splitted
, so we use lapply
, to apply our function to each of the sub-data.frames:
selection <- lapply(splitted, function(subdf) {
subdf[subdf$Is.Check==0 &
subdf$Reaction >= min(subdf$Reaction[subdf$Is.Check==1]) &
subdf$Reaction <= max(subdf$Reaction[subdf$Is.Check==1])
,]
})
Selection is now a list with for each trial a selection of the right non-checks. If you want to recombine those selections into one large data.frame you can use bind_rows
from the dplyr
-package
install.packages('dplyr') # If you don't have it yet
Fullselection <- dplyr::bind_rows(selection)
If for some reason you can't install dplyr
, do.call(rbind, selection)
also works (but a bit slower, and it's uglier code.)