I would like to use startsWith and str_length to identify the entries in the endpt_ds$DX1 that have start with the strings in dx9 and have a prefix of length greater than or equal to 3. This is what I've tried, but it returns a dataframe of zero rows. I would like it to return a dataframe with the 1st, 4th and 5th rows of the original dataframe:
dx9 = c(as.character(8:10))
DX1 <- c("8001","7","80","992","1010","93","400")
ind <- c(0,1,1,1,0,0,1)
yrMonth_ds = as.data.frame(cbind(DX1,ind))
yrMonth_ds$DX1 <- as.character(yrMonth_ds$DX1)
yrMonth_ds_endpt <- yrMonth_ds[which(startsWith(yrMonth_ds$DX1,paste0(dx9,collapse="|")) & str_length(yrMonth_ds$DX1 > 3)),]
yrMonth_ds_endpt
I would really appreciate any help. Thanks!
One option is to check the number of characters with nchar
, create a logical expression with that, in addition use paste
on the 'dx9' by collapsing it to a single pattern string with ^
to specify the start of the string and check with 'DX1' using grepl
to return the rows that pass with both logic
subset(yrMonth_ds, nchar(DX1) >=3 &
grepl(paste0("^(", paste(dx9, collapse="|"), ")"), DX1))
# DX1 ind
#1 8001 0
#4 992 1
#5 1010 0