This question is related to this post give each id the same column value R butthe solution there is not working for me.
I have a datatable where index is the first date that an individual got a drug C10.*? in the period between 2010-04-01 and 2010-09-30:
names drugs dates index
1: mary C10AA07 2009-10-01 NA
2: mary C09AA03 2010-06-01 NA
3: mary C10AA07 2010-07-01 2010-07-01
4: mary A02BC01 2010-07-01 NA
5: mary C10AA07 2010-07-24 2010-07-01
6: tom C10AA05 2009-12-01 NA
7: tom C10AA05 2010-04-06 2010-04-06
8: tom C07AB03 2010-05-12 NA
9: tom C10AA05 2010-08-01 2010-04-06
I am trying to give mary her index date for all rows in the column 'index'. and similarly for tom. so that the ouput is like this:
names drugs dates index
1: mary C10AA07 2009-10-01 2010-07-01
2: mary C09AA03 2010-06-01 2010-07-01
3: mary C10AA07 2010-07-01 2010-07-01
4: mary A02BC01 2010-07-01 2010-07-01
5: mary C10AA07 2010-07-24 2010-07-01
6: tom C10AA05 2009-12-01 2010-04-06
7: tom C10AA05 2010-04-06 2010-04-06
8: tom C07AB03 2010-05-12 2010-04-06
9: tom C10AA05 2010-08-01 2010-04-06
this is exactly what the problem is in the link posted above. These are the lines of code I have tried, but each code gives me pack just all NA values for index or else doesnt change my dt2
Attempt 1:
dt2[, index := index[grepl('^C10.*?', as.character(dt2$drugs))& dt2$dates>="2010-04-01" & dt2$dates<"2010-10-01"][1], by = names]
dt2
attempt 2:
dt2[, index := index[grepl('^C10.*?', as.character(dt2$drugs))[1], by = names])
dt2
I can't understand whats happening and why the code won't work. If anyone can shed any light on this that would be great. Thank you.
attempt 3:
dt2[, index := index[drugs == 'C10AA05' & drugs=='C10AA07'][1], by = names]
dt2
By what you are describing, if your index is not yet filled, use this:
dt2[, index := min(dates[grepl("^C10", drugs)], na.rm=TRUE), by=names]
If your index already has the correct value, and you are simply trying to fill the NA's, use the following instead, as it will be faster
> dt2[, index := index[!is.na(index)][[1]], by=names]
> dt2
names drugs dates index
1: mary C10AA07 2009-10-01 2010-07-01
2: mary C09AA03 2010-06-01 2010-07-01
3: mary C10AA07 2010-07-01 2010-07-01
4: mary A02BC01 2010-07-01 2010-07-01
5: mary C10AA07 2010-07-24 2010-07-01
6: tom C10AA05 2009-12-01 2010-04-06
7: tom C10AA05 2010-04-06 2010-04-06
8: tom C07AB03 2010-05-12 2010-04-06
9: tom C10AA05 2010-08-01 2010-04-06
>
If you are going to be doing this often, I would recommend setting key to drugs
or even creating a new column with the drugid. Note that you can use the key in .SD
, so the following would work for you:
dt2[, drugid := substr(drugs, 1, 3)]
setkey(dt2, drugid)
## HAVE A LOOK AT THE OUTPUT
dt2[, .SD[.("C10"), min(dates)]]
dt2[, .SD[.("C10"), min(dates)], by=names]
dt2[, .SD[.("C10"), min(dates)]$V1, by=names]
dt2[, index := .SD[.("C10"), min(dates)]$V1, by=names]