Search code examples
rdataframelapply

R lapply() behaviour on on data frames with is.numeric()


Can't really understand the following behaviour.

>ddd <- data.frame(a=c(2,3,4), b=c(10,20,30)) ## creating a simple dataframe with 2 columns
> ddd
  a  b
1 2 10
2 3 20
3 4 30

applying lapply() gives the expected results as below:

> lapply(ddd, function(x) x*100 )
$a
[1] 200 300 400

$b
[1] 1000 2000 3000

However when is.numeric() is used inside FUN it applies only to the first row. How come?

> lapply(ddd, function(x) ifelse( is.numeric(x), x*100, x ) )
$a
[1] 200

$b
[1] 1000

when somehow is.numeric() is used in conjunction with is.na(), it again works as usual.

> lapply(ddd, function(x) ifelse( is.numeric(x) & !is.na(x), x*100, x ) )
$a
[1] 200 300 400

$b
[1] 1000 2000 3000

Why is this happening?


Solution

  • The problem here is that is.numeric(x) returns a single value. The reason it works with is.na() is that is.na() returns an object of the same length as the input. When you use them together the TRUE from is.numeric gets recycled to the correct length.

    > is.na(ddd$a)
    [1] FALSE FALSE FALSE
    > is.numeric(ddd$a)
    [1] TRUE
    > is.numeric(ddd$a) & !is.na(ddd$a)
    [1] TRUE TRUE TRUE
    

    As @jay.sf mentions in the comments, ifelse() returns a result the same length as the test parameter. So your code only applies to the first value of each column.

    One way to get around this is to replace ifelse() with if( ) { } else { }:

    lapply(ddd, function(x) if(is.numeric(x)) {x*100} else {x} )