Search code examples
rvariablesfor-loop

Changing Values on the Same Column for Different DataFrames (in R)


I have the following situation:

I have 17 different DataFrames, all of them have the same column (COD_MOEDA) I have a Vector (NAME_DATAFRAME) that contains the 17 variable names of the DataFrames

I am trying to loop in all Dataframes in order to apply the condition : for all values == 0 on column COD_MOEDA on all DATAFRAMES, change it to “BRL”

I tried concatenating the condition on a variable (ConditionString), but i am not able to apply the rule using ConditionString <- "BRL":

for (i in 1:17)
{

ConditionString <-paste(NAME_DATAFRAME[i],"&COD_MOEDA[",NAME_DATAFRAME[i],"&COD_MOEDA==0”],sep=””)

(??)

}

Any suggestions ?


Solution

  • In situations like this where you are trying to manipulate objects by name (as a character string), the get and assign functions are your friend. Here is a solution that loops through each data.frame by name.

    for (df in NAME_DATAFRAME) {
    
        # Get the current dataframe by name
        y <- get(df, pos = globalenv())
    
        # Do stuff
        y$COD_MOEDA[y$COD_MOEDA == 0] <- 'BRL'
    
        # Assign the new dataframe back to its name
        assign(df, y, pos = globalenv())
    
        # Tidy up
        rm(y)
    }
    rm(df)
    

    However, this is not a very "R-like" solution because R is not particularly efficient at loops. As @josliber points out, you'll have better performance (and readability) if you store your dataframes in a list and use apply functions to operate on each in sequence.

    You may be able to get your dataframes into a list by altering your upstream code, but here's an easy way to get there from your current state:

    list_of_dataframes <- sapply(
        NAME_DATAFRAME,
        get,
        pos = globalenv(),
        simplify = FALSE
    )
    

    From here, you can use the lapply function to manipulate each dataframe.

    list_of_modified_dataframes <- lapply(
        list_of_dataframes,
        function(x) {
            # Inside this function, `x` represents a single dataframe
    
            # Do stuff
            x$COD_MOEDA[x$COD_MOEDA == 0] <- 'BRL'
    
            # And return the modified dataframe
            return(x)
        }
    )
    

    I'm not sure what your ultimate goal is, but be aware that assigning the character string 'BRL' implicitly converts your column from numeric to character. This means that subsequent numeric conditional statements (e.g. COD_MOEDA > 42) will not work.