I'd appreciate some help with this one. I have something similar to the data below.
df$A df$B
1 .
1 .
1 .
1 6
2 .
2 .
2 7
What I need to do is fill in df$B
with each value that corresponds to the end of the run of values in df$A
. Example below.
df$A df$B
1 6
1 6
1 6
1 6
2 7
2 7
2 7
Any help would be welcome.
It seems to me that the missing values are denoted by .
. It is better to read the dataset with na.strings="."
so that the missing values will be NA
. For the current dataset, the 'B' column would be character/factor
class (depending upon whether you used stringsAsFactors=FALSE/TRUE
(default) in the read.table/read.csv
.
Using data.table
, we convert the data.frame
to data.table
(setDT(df1)
), change the 'character' class to 'numeric' (B:= as.numeric(B)
). This will also result in coercing the .
to NA
(a warning will appear). Grouped by "A", we change the "B" values to the last element (B:= B[.N]
)
library(data.table)
setDT(df1)[,B:= as.numeric(B)][,B:=B[.N] , by = A]
# A B
#1: 1 6
#2: 1 6
#3: 1 6
#4: 1 6
#5: 2 7
#6: 2 7
#7: 2 7
Or with dplyr
library(dplyr)
df1 %>%
group_by(A) %>%
mutate(B= as.numeric(tail(B,1)))
Or using ave
from base R
df1$B <- with(df1, as.numeric(ave(B, A, FUN=function(x) tail(x,1))))
df1 <- structure(list(A = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), B = c(".",
".", ".", "6", ".", ".", "7")), .Names = c("A", "B"),
class = "data.frame", row.names = c(NA, -7L))