Search code examples
rdatefor-loopvectorr-factor

R - using a for loop to modify a column in a data frame (and using factor levels)


I'm trying to convert a factor of dates to a character vector that can be referenced by a for loop. The for loop should replace NA values in the "Day" column of a data frame (ex. shown below) with values that correspond to the date.

     Date    Time Axis1 Day Sum.A1.Daily
1 6/12/10 5:00:00    20  NA           NA
2 6/12/10 5:01:00    40  NA           NA
3 6/13/10 5:02:00    50  NA           NA
4 6/13/10 5:03:00    10  NA           NA
5 6/14/10 5:04:00    20  NA           NA
6 6/14/10 5:05:00    30  NA           NA

I need to transform it to this:

     Date    Time Axis1 Day Sum.A1.Daily
1 6/12/10 5:00:00    20   1           60
2 6/12/10 5:01:00    40   1           60
3 6/13/10 5:02:00    50   2           80
4 6/13/10 5:03:00    30   2           80
5 6/14/10 5:04:00    20   3           50
6 6/14/10 5:05:00    30   3           50

Using my current code, what I'm getting is this:

     Date    Time Axis1 Day Sum.A1.Daily
1 6/12/10 5:00:00    20  NA           60
2 6/12/10 5:01:00    40  NA           60
3 6/13/10 5:02:00    50  NA           80
4 6/13/10 5:03:00    30  NA           80
5 6/14/10 5:04:00    20  NA           50
6 6/14/10 5:05:00    30  NA           50

Something is going wrong in my for loops that assign values to column 4. I need help understanding two things:

  1. What the problem is (current script below)
  2. If I could circumvent the problem by using factor levels more effectively

I'm new to R and stackoverflow - overwhelmed by how cool this community is. Please let me know if I'm violating a cardinal question-asking rule.

## read in file; define classes 
## (important b/c I want R to utilize factor levels of "Date" in column 1 of .csv file)
dat <- read.csv("data.csv", header = T, ## read in file
      colClasses = c("factor", "character", "integer", "integer", "integer"))

## assign values to be used by for loops
levs <- lapply(dat, levels) ## grab levels for factor variable of dates
dates <- c(levs$Date) ## creates list of dates to reference in for loop
counts <- c(1:length(dates)) ## creates vector 1:number of dates listed in file for loop 2
x <- (1:nrow(dat)) ## creates vector 1:number of rows in file

## for loop 1 will cycle through rows in file; 
## for loop 2 cycle through values in "counts" variable
      ## if() compares value of each object in "Dates" (col. 1) 
       ## to one of the value of one of the levels (e.g., compared to "6/22/10", not 1)
            ## if ==, assigns corresp. value of "counts" to the appropriate obs. of col. 4 

("Day")
    for (i in x) {
          for (j in counts) {
                if (dat[i,1] == levs[j]) {
                      dat[i,4] <- counts[j]
                }
          }
    }
dat <- transform(dat, Sum.A1.Daily = ave(dat$Axis1, dat$Date, FUN = sum))
if(!file.exists("ActData.csv")) {     ## Enter file name for new data
write.csv(dat, file = "ActData2.csv") ## Enter file name for new data
  } else { stop("change file name") 
}
print("File Cleaning Complete")
head(dat)
tail(dat)

Solution

  • This is the kind of problem where loops are really quite inefficient. Try using a vectorized approach:

    dat$day <- as.numeric(factor(dat$Date))  
    dat$Sum.A1.Daily <- ave(dat$Axis1, dat$Date, FUN=sum)
    

    The first one uses the fact that factors are really integer indices into an alpha levels vector. In this case we are just throwing away the levels attribute and just using the integer series.

    Edit: Wait!; you already used it properly inside transform: ave computes the value of the FUN argument within categories of the second argument and returns a vector of the same length as its first argument.