Hi I come from a background in SAS and I am relatively new to R. I am attempting to convert an existing SAS program into equivalent R code
I am unsure how to achieve the equivalent of SAS's "retain" and "by" Behavior in R
I have a dataframe with two columns first column is a date column and the second column is a numeric value.
The numeric column represents a result from lab test. The test is conducted semi-regularly so on some days there will be Null values in the data. The data is ordered by date and the dates are sequential.
i.e example data looks like this
Date Result
2017/01/01 15
2017/01/02 NA
2017/01/03 NA
2017/01/04 12
2017/01/05 NA
2017/01/06 13
2017/01/07 11
2017/01/08 NA
I would like to create a third column which would contain the most recent result. If Result column is Null it should be set to most recent previously non Null Result otherwise it should contain the Result value
My desired output would look like this:
Date Result My_var
2017/01/01 15 15
2017/01/02 NA 15
2017/01/03 NA 15
2017/01/04 12 12
2017/01/05 NA 12
2017/01/06 13 13
2017/01/07 11 11
2017/01/08 NA 11
In SAS I can achieve this with something like following code snippet:
data my_data;
retain My_var;
set input_data;
by date;
if Result not = . then
my_var = result;
run;
I am stumped as to how to do this in R I do not think R supports By group processing as in SAS - or at least I don't know how to set that as option.
I have naively tried:
my_data <- mutate(input_data, my_var = if(is.na(Result)) {lag(Result)} else {Result})
But I do not think that syntax is correct.
We can use na.locf
function from the zoo package to fill in the missing values.
library(zoo)
dt$My_var <- na.locf(dt$Result)
dt
# Date Result My_var
# 1 2017/01/01 15 15
# 2 2017/01/02 NA 15
# 3 2017/01/03 NA 15
# 4 2017/01/04 12 12
# 5 2017/01/05 NA 12
# 6 2017/01/06 13 13
# 7 2017/01/07 11 11
# 8 2017/01/08 NA 11
Or the fill
function from the tidyr package.
library(dplyr)
library(tidyr)
dt <- dt %>%
mutate(My_var = Result) %>%
fill(My_var)
dt
# Date Result My_var
# 1 2017/01/01 15 15
# 2 2017/01/02 NA 15
# 3 2017/01/03 NA 15
# 4 2017/01/04 12 12
# 5 2017/01/05 NA 12
# 6 2017/01/06 13 13
# 7 2017/01/07 11 11
# 8 2017/01/08 NA 11
DATA
dt <- read.table(text = "Date Result
2017/01/01 15
2017/01/02 NA
2017/01/03 NA
2017/01/04 12
2017/01/05 NA
2017/01/06 13
2017/01/07 11
2017/01/08 NA",
header = TRUE, stringsAsFactors = FALSE)