this question is quite specific I guess.
I am looking for the R equivalent of Stata egen function, in particular
egen max BY varlist.
In my data table I have a column with ID
ID <- c(1,1,2,2,3,4,5,6,6)
a column with year of observation
year <- c(2000,2000,2001,2002,2002,2003,2004,2004,2004)
year and ID are not unique, because there can be more observations in one year.
Finally I have a column with names
names <- c("Mark",NA,"John","John",NA,"Sarah","Julia",NA,NA).
First of all I want to generate a dummy that is equal to 1 if name is not NA and I figured
dummy <- ifelse(!is.na(names),1,0)
Then I want R to return the maximum of the dummy based on ID and year, which in Stata would be
egen MAX = max(dummy), by(ID year)
In practice, I want R to return 0 if names is NA for all rows with the same ID and year. So I would get (1,1,1,1,0,1,1,0,0) and I can proceed to drop rows 5,7,8. Thank you!
Here's an attempt using ave
, but this logic will work with any grouping function you might know. Look for NA
s by group, see if they are not (!
/ Negate
) all
TRUE
:
ave(is.na(dat[["names"]]), dat[c("ID","year")], FUN=Negate(all))
#[1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
Where dat
was:
dat <- data.frame(ID,year,names, stringsAsFactors=FALSE)