Search code examples
rdataframevectorizationdata-cleaning

Changing variable value based on condition


I have data frame:

a<-c(1,2,3,4)
b<-c(1988,1970,1999,2000)
years_practicing<-rep(NA,4)
df<-data.frame("ID"=a, "grad_year"=b, "years_practicing"=years_practicing)

that looks like:

ID   grad_year    years_practicing
1     1988           NA
2     1970           NA
3     1999           NA
4     2000           NA

Now I want to do this (it is pseudocode!):

if (ID=1 || ID=2) 
{
   years_practicing[corresponding cell]<-2017-grad_year
}

if (ID=3 || ID=4) 
{
   years_practicing[corresponding cell]<-2018-grad_year
}

to achieve this:

ID   grad_year    years_practicing
1     1988           29
2     1970           47
3     1999           19
4     2000           18

I know how to do it in procedural way (with while loop and if statements) but I want to do it in vectorized way.

I tried this (and similar variations):

year_2017_start<-c(1, 2)
year_2018_start<-c(3,4)
df$years_practicing[any(df$ID == year_2017_start)]<- 2017-df$grad_yr
df$years_practicing[any(df$ID == year_2018_start)]<- 2018-df$grad_yr

But receiving error:

Error in df$years_practicing[any(df$ID == year_2017_start)] <- 2017 -  : 
  replacement has length zero
> df$years_practicing[any(df$ID == year_2018_start)]<- 2018-df$grad_yr
Error in df$years_practicing[any(df$ID == year_2018_start)] <- 2018 -  : 
  replacement has length zero

Questions:

  1. How to improve my code to make it work. (answer required)

  2. Is there a faster way to achieve similar result? (optional)


Solution

  • Not sure the motivation that you have to use a vectorized approach to update the value; but some vectorized function, such as ifelse() may be of a better help here. Anyway, below is the vectorized solution you want:

    df$years_practicing[which(df$ID == year_2017_start)]<- 2017-df$grad_year[which(df$ID == year_2017_start)]