Search code examples
rloopsdummy-variable

Creating a loop over variables' names


I'm new to R (started a few days ago) and coming from STATA. I am trying to create a loop to create dummy variables when a variable has value -9. I want to use a loop as I have got plenty of variables like this.

In the following, reflex_working is my dataframe and "A7LECTUR" etc are my variables. I am trying to create a dummy called "miss_varname" for each variable using the ifelse function.

varlist<-c("A7LECTUR", "A7GROASG", "A7RESPRJ", "A7WORPLC", "A7PRACTI", 
"A7THEORI", "A7TEACHR", "A7PROBAL", "A7WRIASG", "A7ORALPR")

for (i in varlist){
    reflex_working$miss_[i]<-ifelse(reflex_working$i==-9,1,0)
    } 

I get the following warnings for each iteration:

1: Unknown or uninitialised column: 'miss_'.
2: Unknown or uninitialised column: 'i'.

And no variable is created. I assume this must be something very trivial for everyone, but I have been trying for the last hour to create this kind of loop and have zero results to show.

Edit: I have something like:

A7LECTUR
1
2
1
4
-9    

And would like, after the loop, to have a new column like:

reflex_working$miss_A7LECTUR
0
0
0
0
1

Hope this helps clarifying what I'm trying to achieve! Any help would be seriously appreciated.

Gabriele


Solution

  • Let's break this down into why it doesn't work. For starters, in R

    i
    A7LECTUR 
    # and
    "A7LECTUR"
    

    are different. The first two are variablenames, the latter is a value. I am emphasising this difference, because it is an important distinction.

    Working with lists (and data frames, as data frames are basically lists with some restrictions to make them rectangular), in the syntax reflex_working$i reflex_working refers to the variable and i is refers to the element named "i" within the list. In reflex_working$i, the i is literal and R doesn't care if you have an variable named i.

    With programming, we want to be a bit more dynamic. So you correctly assumed using a variable would do the trick. If you want to do that, you have to use the [ or [[ subset method ([ always returns a list, while [[ will return the element without the encapsulating list[1]).

    To summarise:

    reflex_working$i    # gets the element named i, no matter what.
    reflex_working[[i]] # gets the element whose name (or position) is stored in the variable i
    reflex_working$i == reflex_working[["i"]]
    

    That should explain the right-hand-side of your line in the loop. The correct statement should read

    ifelse(reflex_working[[i]]==-9,1,0)
    

    For the left-hand-side, reflex_working$miss_[i], things are completely off. What you want can be decomposed into several steps:

    1. Compose a value by concatenating "miss_" and the value of i.
    2. Use that value as the element/column name.

    We can combine these two into (as a commentor stated)

    reflex_working[[paste0('miss_', i)]] <- ...
    

    Good job on you, for realising that R is inherently vectorized - since you are not writing a loop for each row in the column. Good one!


    [1] but [[ can return a list, if the element itself is a list. R can be... weird full of surprises.