Can't really understand the following behaviour.
>ddd <- data.frame(a=c(2,3,4), b=c(10,20,30)) ## creating a simple dataframe with 2 columns
> ddd
a b
1 2 10
2 3 20
3 4 30
applying lapply()
gives the expected results as below:
> lapply(ddd, function(x) x*100 )
$a
[1] 200 300 400
$b
[1] 1000 2000 3000
However when is.numeric()
is used inside FUN it applies only to the first row. How come?
> lapply(ddd, function(x) ifelse( is.numeric(x), x*100, x ) )
$a
[1] 200
$b
[1] 1000
when somehow is.numeric()
is used in conjunction with is.na()
, it again works as usual.
> lapply(ddd, function(x) ifelse( is.numeric(x) & !is.na(x), x*100, x ) )
$a
[1] 200 300 400
$b
[1] 1000 2000 3000
Why is this happening?
The problem here is that is.numeric(x)
returns a single value. The reason it works with is.na()
is that is.na()
returns an object of the same length as the input. When you use them together the TRUE from is.numeric
gets recycled to the correct length.
> is.na(ddd$a)
[1] FALSE FALSE FALSE
> is.numeric(ddd$a)
[1] TRUE
> is.numeric(ddd$a) & !is.na(ddd$a)
[1] TRUE TRUE TRUE
As @jay.sf mentions in the comments, ifelse()
returns a result the same length as the test parameter. So your code only applies to the first value of each column.
One way to get around this is to replace ifelse()
with if( ) { } else { }
:
lapply(ddd, function(x) if(is.numeric(x)) {x*100} else {x} )