In the example bellow, why A is not equal to B ? The problem is related to the Date column since AA=BB, but I can't understand why...
library(lubridate)
MM=data.frame(ID=1:3,Date=ymd(c("2019-11-07","2019-11-07","2019-11-13")),X=c(-1,1,1),Y=c(1,-1,-1))
A=apply(MM,2,function(x) sum(x>0))
B=colSums(MM>0)
A
# ID Date X Y
# 3 3 0 0
B
# ID Date X Y
# 3 3 2 1
AA=apply(MM[,-2],2,function(x) sum(x>0))
BB=colSums(MM[,-2]>0)
AA
# ID X Y
# 3 2 1
BB
# ID X Y
# 3 2 1
R version 4.0.3 (2020-10-10)
It's because apply
, before applying the function to each column, first of all converts the data.frame into a matrix.
Therefore:
> as.matrix(MM)
ID Date X Y
[1,] "1" "2019-11-07" "-1" " 1"
[2,] "2" "2019-11-07" " 1" "-1"
[3,] "3" "2019-11-13" " 1" "-1"
Everything is converted to character.
Specifically, you can see that:
> " 1" > 0
[1] FALSE
> "1" > 0
[1] TRUE
This is because 0 is converted to char.
"0" comes before " 1" alphabetically because the character 0 gets compared to the blank instead of the character "1".
The order of strings are decided by comparing character by character 1 by 1.
(it's the same reason why "10" > "9"
is FALSE
, because 1
alphabetically is before 9
).
> sort(c("0", " 1"))
[1] " 1" "0"
> sort(c("0", "1"))
[1] "0" "1"
As a consequence:
> as.matrix(MM)>0
ID Date X Y
[1,] TRUE TRUE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE
[3,] TRUE TRUE FALSE FALSE
colSums
doesn't do that.
You can get the same result you would get with colSums
if you do:
> apply(MM>0, 2, sum)
ID Date X Y
3 3 2 1