I have a long-format dataframe data.set
, in which each subject has different numeric values (data.set$target_resp.rt
) per conditions. I have already winsorized my data with respect to an overall criterion by using the DescTool function Winsorize
(see here for info):
overall.criterion.2sd <- data.set$overall.mean+(2*data.set$overall.sd)
winsors.2 <- DescTools::Winsorize(data.set$target_resp.rt, maxval=overall.criterion.2sd[1])
Above, it was possible to define maxval
as the first value of the variable overall.criterion.2sd
, as it's the same values for all subjects. Now I would like to winsorize my data by subject, i.e. I'll need to run within-subject row-by-row winsorizisation. Here's my attempt, with criterion.2sd
is just a vector of N values (N=no. of subjects):
criterion.2sd <- data.set$rt.mean+(2*data.set$rt.sd)
within.winsors.2 <- data.set %>% group_by(Nome, Cognome) %>%
Winsorize(data.set$target_resp.rt, maxval=unique(criterion.2sd))
The following error pops up:
Error in
[<-.data.frame
(*tmp*
, x < minval, value = c(1.35768795013, : 'value' is the wrong length
I understand that something is wrong the cardinality of the maxval
variable, but I can't figure out how to fix it. Can anybody help?
Here's a sample of the dataset data.set
(hopefully it's enough; let me know if it's the right format):
subject target_resp.rt rt.mean rt.sd
1 1 1.0398901 0.9016781 0.3109358
2 1 0.6887729 0.9016781 0.3109358
3 1 0.7691720 0.9016781 0.3109358
4 1 1.0064900 0.9016781 0.3109358
5 1 0.8195999 0.9016781 0.3109358
6 2 0.8410320 1.0500845 0.4210796
7 2 0.8229311 1.0500845 0.4210796
8 2 0.9250839 1.0500845 0.4210796
9 2 1.0085750 1.0500845 0.4210796
10 2 1.1406291 1.0500845 0.4210796
11 3 0.5561039 0.749789 0.2350127
12 3 0.6022139 0.749789 0.2350127
13 3 0.8560688 0.749789 0.2350127
14 3 0.5886030 0.749789 0.2350127
15 3 0.5520449 0.749789 0.2350127
It's a problem with mixed up dplyr syntax. In the original question, you're passing a vector to Winsorize
, but data.set %>% group_by(Nome, Cognome)
is a dataset and the pipe (%>%
) passes the whole dataset to the first argument of Winsorize
, meaning you're really calling
Winsorize(x = data.set, minval = ..., maxval = ...)
What you really want is to use mutate
after the group_by
to change target_resp.rt
; the syntax looks like:
data.set %>% group_by(subject) %>%
mutate(target_winsorized = Winsorize(target_resp.rt, maxval=unique(overall.criterion.2sd))
That creates a new variable in the dataset target_winsorized
with the properties you want. In the future you might also want to save the overall.criterion.2sd
inside the dataset too.
Check out the dplyr
docs if want to learn more about syntax and dplyr
style.