Search code examples
rfunctiondataframetypesassign

Is it possible to keep the datatype while using assign inside a function in r?


My first post here so please tell me if I'm missing any important information.

I am handling a lot of data in form of time(1:30=rowID) vs value all stored in a number of dataframes and I need to keep it as a data.frame. I wrote a function that gets dataframes from my global environment and sorts the columns in each set into new data frames depending on their values.

So I start with a list of names of my data frames as input for my function and then end with assigning the created new dataframes to my global environment while using the assign function. The dataframes I get all are 30 rows long, but have different column length depending on how often a case appears in a dataset. The names of each dataframe represent one data set and the column names inside represent one timeline. I use data frames, so I don't loose the information of the column name.

This works for having 0 cases and everything above 1. But if a data.frame ends up with only one column and I use the assign function it appears as a vector in my global environment instead of a data frame. Therefore I loose the name of the column and my other functions that only use data frames stop at such a case and throw errors.

Here is a basic example of my problem:

#create two datasets with different cases
data1 <- data.frame(matrix(nrow=30, ncol=5))
data1[1] <- c(rep(1,each=30))
data1[2] <- c(rep(5, each=30))
data1[3] <- c(rep(5, each=30))
data1[4] <- c(rep(10, each=30))
data1[5] <- c(rep(10, each=30))

data2 <- data.frame(matrix(nrow=30, ncol=6))
data2[1] <- c(rep(5,each=30))
data2[2] <- c(rep(1, each=30))
data2[3] <- c(rep(1, each=30))
data2[4] <- c(rep(0, each=30))
data2[5] <- c(rep(0, each=30))
data2[6] <- c(rep(10, each=30))

#create list with names of datasets
names <- c('data1','data2')

#function for sorting
examplefunction <- function(VarNames) {
  for (i in 1:length(VarNames)) {
    #get current dataset
    name <- VarNames[i]
    data <- get(VarNames[i])

    #create new empty data.frames for sorting
    data.0 <- data.frame(matrix(nrow=30))
    name.data.0 <- paste(name,"0", sep=".")
    c.0 = 2 #start at second column, since first doesn't like the colname later
    data.1 <- data.frame(matrix(nrow=30))
    name.data.1 <- paste(name,"1", sep=".")
    c.1 = 2
    data.5 <- data.frame(matrix(nrow=30))
    name.data.5 <- paste(name,"5", sep=".")
    c.5 = 2
    data.10 <- data.frame(matrix(nrow=30))
    name.data.10 <- paste(name,"10", sep=".")
    c.10 = 2

    #sort data into new different data.frames
    for (c in 1:ncol(data)) {

      if(data[1,c]==0) {
        data.0[c.0] = data[c]
        c.0 = c.0 +1
      }
      else if(data[1,c]==1) {
        data.1[c.1] = data[c]
        c.1 = c.1 +1
      }
      else if(data[1,c]==5) {
        data.5[c.5] = data[c]
        c.5 = c.5 +1
      }
      else if(data[1,c]==10) {
        data.10[c.10] = data[c]
        c.10 = c.10 +1
      }
      else (stop="new values")
    }

    #remove first column with weird name
    data.0 <- data.0[,-1] 
    data.1 <- data.1[,-1] 
    data.5 <- data.5[,-1] 
    data.10 <- data.10[,-1] 

    #assign data frames to global environment
    assign(name.data.0, data.0, envir = .GlobalEnv)
    assign(name.data.1, data.1, envir = .GlobalEnv)
    assign(name.data.5, data.5, envir = .GlobalEnv)
    assign(name.data.10, data.10, envir = .GlobalEnv)

  }
}

#function call
examplefunction(names)

As explained before, if you run this you will end up with data frames of 0 variables and >1 variables. And three vectors, where the data frame had only one column.

So my questions are: 1. Is there any way to keep the data type and forcing R to assign it to a data frame instead of a vector? 2. Or is there an alternative function I could use instead of assign()? If I use <<- how can I do the name assigning as above?


Solution

  • You can use drop = FALSE when subsetting:

    examplefunction <- function(VarNames) {
        for (i in 1:length(VarNames)) {
            #get current dataset
            name <- VarNames[i]
            data <- get(VarNames[i])
    
            #create new empty data.frames for sorting
            data.0 <- data.frame(matrix(nrow=30))
            name.data.0 <- paste(name,"0", sep=".")
            c.0 = 2 #start at second column, since first doesn't like the colname later
            data.1 <- data.frame(matrix(nrow=30))
            name.data.1 <- paste(name,"1", sep=".")
            c.1 = 2
            data.5 <- data.frame(matrix(nrow=30))
            name.data.5 <- paste(name,"5", sep=".")
            c.5 = 2
            data.10 <- data.frame(matrix(nrow=30))
            name.data.10 <- paste(name,"10", sep=".")
            c.10 = 2
    
            #sort data into new different data.frames
            for (c in 1:ncol(data)) {
    
                if(data[1,c]==0) {
                    data.0[c.0] = data[c]
                    c.0 = c.0 +1
                }
                else if(data[1,c]==1) {
                    data.1[c.1] = data[c]
                    c.1 = c.1 +1
                }
                else if(data[1,c]==5) {
                    data.5[c.5] = data[c]
                    c.5 = c.5 +1
                }
                else if(data[1,c]==10) {
                    data.10[c.10] = data[c]
                    c.10 = c.10 +1
                }
                else (stop="new values")
            }
    
            #remove first column with weird name
            data.0  <- data.0[ , -1, drop = FALSE]
            data.1  <- data.1[ , -1, drop = FALSE]
            data.5  <- data.5[ , -1, drop = FALSE]
            data.10 <- data.10[ , -1, drop = FALSE] 
    
            #assign data frames to global environment
            assign(name.data.0,  data.0,  envir = .GlobalEnv)
            assign(name.data.1,  data.1,  envir = .GlobalEnv)
            assign(name.data.5,  data.5,  envir = .GlobalEnv)
            assign(name.data.10, data.10, envir = .GlobalEnv)
    
        }
    }
    
    #function call
    examplefunction(names)
    

    Let's take a look at the one-column dataframes:

    str(data1.1)
    'data.frame':   30 obs. of  1 variable:
      $ X1: num  1 1 1 1 1 1 1 1 1 1 ...
    str(data2.10)
    'data.frame':   30 obs. of  1 variable:
      $ X6: num  10 10 10 10 10 10 10 10 10 10 ...
    

    Now, all that said, I agree with Roland's comment -- you almost never want to take this approach of assigning to the global environment in a complicated way, and instead should return a list; that's best practice. However, you'd still need drop = FALSE to keep the column names. Really, to me, there's probably an entirely different approach to doing whatever kind of data wrangling you're wanting to do that is a much better approach. I just don't have a good grasp of your task to make a suggestion.