Search code examples
rdatabasedataframevariablesvar

Error in `mutate()` while creating a new variable using R


So I have a dataframe and I want to create a new variable randomly using other factors; my data contains this key variables:

iQ Age Educ_y
5 23 15
4 54 17
2 43 6
3 13 7
5 14 8
1 51 16

I want to generate a new variable (years of experience) randomly using this creterias:

If Age >= 15 & Iq<= 2 so "Exp_y" takes a randome number between (Age-15)/2 and Age-15.

If (Age >= 15 & (Iq==3 | Iq==4) so "Exp_y" takes a randome number between (Age-Educ_y-6)/2 and (Age-Educ_y-6).

And 0 otherwise.

I tried using this code :

Df <- Df %>% 
  rowwise() %>% 
  mutate(Exep_y = case_when(
    Age > 14 & iq <= 2 ~ sample(seq((Age-15)/2, Age-15, 1), 1),
    Age > 14 & between(iq, 3, 4)  ~ sample(seq((Age-Educ_y-6)/2, Age-Educ_y-6, 1), 1),
    TRUE               ~ 0
  ))

But I end up with this Error message:

Error in `mutate()`:
! Problem while computing `Exep_y = case_when(...)`.
i The error occurred in row 3.
Caused by error in `seq.default()`:
! signe incorrect de l'argument 'by'

Any ideas please; Best Regards


Solution

  • This error message is occurring because the case_when() statement evaluates all the right-hand-side expressions, and then selects based on the left-hand-side.. Therefore, even though, for example row 4 of your sample dataset will default to TRUE~0, the RHS side of the the first two conditions also gets evaluated. In this case, the first condition's RHS is seq((13-15)/2,13-15,1), which returns an error, because in this case from = -1 and to = -2, so the by argument cannot be 1 (it is the wrong sign).

    seq((13-15)/2, 13-15, 1)
    Error in seq.default((13 - 15)/2, 13 - 15, 1) : 
      wrong sign in 'by' argument
    

    You could do something like this:

    f <- function(i,a,e) {
      if(i>4 | a<15) return(0)
      if(i<=2) return(sample(seq((a-15)/2, a-15),1))
      return(sample(seq((a-e-6)/2, a-e-6),1))
    }
    
    Df %>% rowwise() %>% mutate(Exep_y=f(iq,Age,Educ_y))
    

    Output:

         iq   Age Educ_y Exep_y
      <int> <int>  <int>  <dbl>
    1     5    23     15    0  
    2     4    54     17   16.5
    3     2    43      6   21  
    4     3    13      7    0  
    5     5    14      8    0  
    6     1    51     16   27