I have several conditions and several types of measurements in my data.
I want R to give me the value of the outlier for each pair of condition and type of measurement separately.
So, for example, let's say I have 3 conditions (1-3) and 3 types of measures (A-C) for several participants, with a value x for every row. I want to have the outlier of the values x for condition1&measureA, condition2&measureB etc.
(measure and condition are both non-numerical)
I've tried creating a loop
for(d in unique(data$measure)){
for(c in unique(data$condition)){
data %>%
filter(measure == d, condition ==c) %>%
o <- outlier(data$value) %>%
print(o)
}
}
The idea is that R will run through each condition and measure in a loop, and each time pick out the values that match those and calculate the outliers. When I run the whole code I get this error message
Error in print.default(., o) : invalid printing digits -2147483648
In addition: Warning message:
In print.default(., o) : NAs introduced by coercion to integer range
(If I run it without the loop e.g. by searching for outliers for a specific condition, it also cannot find the pipe function after the first line.)
Any idea on how to code this correctly?
You're already using dplyr
, so I suggest you use group_by
, as it (to me) is a more natural way of dealing with the data.
Also, this part is incorrect syntax:
data %>%
filter(measure == d, condition ==c) %>%
o <- outlier(data$value) %>%
print(o)
Why?
The filter(...) %>%
should be piping to something that accepts a frame, but ... you're sending the output from filter
into an assignment o <- outlier(...)
(and then to print(o)
, which really means print(., o)
where .
is the output from the previous command.
Further, since o
is not yet defined the first time this runs ... you should get an error about object 'o' not found
. You won't get it on subsequent passes in the loop, since it does exist ... but if so then it's the outliers from the previous iteration in the loops. Certainly not what you should be using.
A direct correction of that code might be:
for (...) {
for (...) {
o <- data %>%
filter(measure == d, condition ==c) %>%
do({ data.frame(outliers = outlier(.$value)) })
print(o)
}
}
where o
will be a data.frame
(well, tbl_df
tibble) with three columns: measure
, condition
, and outliers
. The use of do
is required in this case because most non-tidyverse functions ignore group_by
groupings, so we use do
to side-step that problem.
Perhaps this, though, to replace both loops into a single command:
data %>%
group_by(measure, condition) %>%
summarize(outliers = outlier(value)) %>%
ungroup()
I'm assuming that what you want is all outlier values for each unique combination of measure
and condition
, and that the outlier(.)
function returns a vector (of some length >= 1). If no outliers are found, the measure
/condition
pair will not be included ... if this is a factor, then use something like
data %>%
group_by(measure, condition) %>%
summarize(outliers = list(outlier(value))) %>%
tidyr::unnest(outliers, keep_empty = TRUE) %>%
ungroup()