I have a dataframe with several columns and I want to obtain all lines where the column of interest takes some values. Initially, I was using ==
as in
which(df$column==c(value1, value2))
It worked for certain vectors, but not all and after some researches, I found out that using %in%
works better.
However, I want to understand why the ==
works for some cases but not all. In particular, I want to understand why I obtain the following results.
test <- data.frame("true_date"=as.Date(1:365, origin="2024-01-01"))
Why
which(test$true_date==c("2024-07-02","2024-07-03"))
returns
[1] 183 184
Message d'avis :
Dans `==.default`(test$true_date, c("2024-07-02", "2024-07-03")) :
la taille d'un objet plus long n'est pas multiple de la taille d'un objet plus court
while
which(test$true_date==c("2024-07-03","2024-07-04"))
returns
integer(0)
Message d'avis :
Dans `==.default`(test$true_date, c("2024-07-03", "2024-07-04")) :
la taille d'un objet plus long n'est pas multiple de la taille d'un objet plus court
One night of sleep and I understood the reason why I got those messages. Thanks one to confirm my too late understanding
As hinted by the warning message (longer object length is not a multiple of shorter object length
), the shorter vector is recycled to match the longer object. In this example, we have:
"2024-06-30" "2024-07-01" "2024-07-02" "2024-07-03" "2024-07-04" "2024-07-05" "2024-07-06"
match match
"2024-07-02" "2024-07-03" "2024-07-02" "2024-07-03" "2024-07-02" "2024-07-03" "2024-07-02"
vs
"2024-06-30" "2024-07-01" "2024-07-02" "2024-07-03" "2024-07-04" "2024-07-05" "2024-07-06"
"2024-07-03" "2024-07-04" "2024-07-03" "2024-07-04" "2024-07-03" "2024-07-04" "2024-07-03"