Search code examples
rfill

Fill a column's blank spaces contingent on a second column in R


I'd appreciate some help with this one. I have something similar to the data below.

df$A df$B
1    .
1    .
1    .
1    6
2    .
2    .
2    7

What I need to do is fill in df$B with each value that corresponds to the end of the run of values in df$A. Example below.

df$A df$B
1    6
1    6
1    6
1    6
2    7
2    7
2    7

Any help would be welcome.


Solution

  • It seems to me that the missing values are denoted by .. It is better to read the dataset with na.strings="." so that the missing values will be NA. For the current dataset, the 'B' column would be character/factor class (depending upon whether you used stringsAsFactors=FALSE/TRUE (default) in the read.table/read.csv.

    Using data.table, we convert the data.frame to data.table (setDT(df1)), change the 'character' class to 'numeric' (B:= as.numeric(B)). This will also result in coercing the . to NA (a warning will appear). Grouped by "A", we change the "B" values to the last element (B:= B[.N])

    library(data.table)
    setDT(df1)[,B:= as.numeric(B)][,B:=B[.N] , by = A]
    #   A B
    #1: 1 6
    #2: 1 6
    #3: 1 6
    #4: 1 6
    #5: 2 7
    #6: 2 7
    #7: 2 7
    

    Or with dplyr

    library(dplyr)
    df1 %>%
         group_by(A) %>%
         mutate(B= as.numeric(tail(B,1)))
    

    Or using ave from base R

    df1$B <- with(df1, as.numeric(ave(B, A, FUN=function(x) tail(x,1))))
    

    data

    df1 <- structure(list(A = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), B = c(".", 
    ".", ".", "6", ".", ".", "7")), .Names = c("A", "B"), 
    class = "data.frame", row.names = c(NA, -7L))