Search code examples
rdataframerow

How many NAs do we have rowwise before the first numerical value in a dataframe?


I have this hypothetical dataframe

  df <- data.frame(K=c(NA, NA, 3, 4,0,2,NA, NA), A=c(NA, NA, NA, 4,0,3,NA, NA), B=c(NA, 2, NA, NA,0,NA,NA,0), C=c(0, 3, 5, NA,0,5,NA,5), D=c(NA, 3, 1, 2,0,10,NA,3))
     df  
       K  A  B  C  D
    1 NA NA NA  0 NA
    2 NA NA  2  3  3
    3  3 NA NA  5  1
    4  4  4 NA NA  2
    5  0  0  0  0  0
    6  2  3 NA  5 10
    7 NA NA NA NA NA
    8 NA NA  0  5  3

I want to find rowwise how many NAs we have before the first numerical value (in that row) and I want to add this information as a last column in the above data frame

So I want to have

df2 <- data.frame(K=c(NA, NA, 3, 4,0,2,NA, NA), A=c(NA, NA, NA, 4,0,3,NA, NA), B=c(NA, 2, NA, NA,0,NA,NA,0), C=c(0, 3, 5, NA,0,5,NA,5), D=c(NA, 3, 1, 2,0,10,NA,3),
        nn=c(3,2,0,0,0,0,5,2))
    df2
     
   K  A  B  C  D nn
1 NA NA NA  0 NA  3
2 NA NA  2  3  3  2
3  3 NA NA  5  1  0
4  4  4 NA NA  2  0
5  0  0  0  0  0  0
6  2  3 NA  5 10  0
7 NA NA NA NA NA  1000
8 NA NA  0  5  3  2

For example, the first row contains 3 NAs before the first value of 0. The second row contains 2 NAs before the first value of 2. The third row contains zero NAs before the first value which is 3 and the same holds for the rows 4-6. For row seven we have only NAs,so in this case I just use the arbitrary value of 1000. For row eight we have two NAs before the first value of 0.

I tried this

DD<-apply(df,1,function(x) which(!is.na(x)))
unlist(map(DD, 1) , use.names=FALSE)-1

but it does not work as I get

3 2 0 0 0 0 2

I miss the row that contains only NAs. Any ideas? Would it be possible to do that using dplyr?

Note that my real dataset contains 500 columns and 200000 rows. So a more general code would help a lot.


Solution

  • You can use max.col

    > df$nn <- (max.col(!is.na(df), "first") - 1) - (rowMeans(is.na(df)) == 1)
    
    > df
       K  A  B  C  D nn
    1 NA NA NA  0 NA  3
    2 NA NA  2  3  3  2
    3  3 NA NA  5  1  0
    4  4  4 NA NA  2  0
    5  0  0  0  0  0  0
    6  2  3 NA  5 10  0
    7 NA NA NA NA NA -1
    8 NA NA  0  5  3  2
    

    where 7-th row is set to -1 since the entire row consists of NAs only.