Search code examples
rloopsfor-looptrigonometrymultiple-databases

Loop over multiple data frames with mathematical function


I have 5 data frames, split from one according to a variable, to which I want to apply the same function based on the same 3 columns from each data frame. Each contains 10,000 rows.

My data:

   Dist     X    Y  deg ofs      Z
1 20.21 499.3 3577 4.77   0 19.750
2 20.23 482.3 3578 4.77 -50 19.731
3 20.23 481.3 3578 4.77 -25 19.741
4 20.23 480.3 3578 4.77   0 19.750
5 20.23 479.3 3578 4.77  25 19.749
6 20.24 478.3 3578 4.77  50 19.740

Split like this:

splitdf <- split(df, df$ofs)
str(offset)
X1 <- splitdf$`-50`
X2 <- splitdf$'-25'
X3 <- splitdf$'0'
X4 <- splitdf$'25'
X5 <- splitdf$'50'
df.list <- list(X1,X2,X3,X4,X5)

I have created two functions of trig:

(X + distance * cos(angle)), (Y - distance * sin(angle))

NewX <- function(x){
  df.list[[i]][2] + df.list[[i]][5] * cos(df.list[[i]][4]) 
}
NewY <- function(x) {
  df.list[[i]][3] - df.list[[i]][5] * sin(df.list[[i]][4]) 
}

I then created a loop to apply these functions to each data frame, thus creating new columns.

for (i in 1:length(df.list)){
  df.list[[i]]$newcol1 <-  lapply(df.list[[i]]$X, FUN=NewX)
  df.list[[i]]$newcol2 <- lapply(df.list[[i]]$Y, FUN=NewY)
}    

Unfortunately this yields no results nor error messages. But the console is busy for a few minutes.

I tried again with the data before splitting to separate data frames using:

NewX <- function(x){
  df[2] + df[5] * cos(df[4]) 
}
NewY <- function(x) {
  df[3] - df[5] * sin(df[4]) 
}

for (i in 1:length(df)){
  df$newX <-  lapply(df$X, FUN=NewX)
  df$newY <- lapply(df$Y, FUN=NewY)
}  

This way is too heavy and does not yield result after one hour. In either case I don't get any error messages so it is very difficult to know what I'm doing wrong.

Does anyone have any ideas? Thanks!

EDIT

I ran the loop over the single file changing the code to add output as a new data frame.

for (i in 1:length(df)){
  lapply(df$X, FUN=NewX)
 lapply(df$Y, FUN=NewY) -> newdf
}    

A NewX column is created, and inside each cell is a single-column data frame with 50,000 results. Removing the loop and running with a pipe yields Error in FUN(X[[i]],...): Unused argument


Solution

  • Actually you could do that with by.

    fun <- function(x) cbind(x, newcol1=x[, 2] + x[, 5]*cos(x[, 4]), newcol2=x[, 3] - x[, 5]*sin(x[, 4]))
    
    by(df, df$ofs, fun)
    # df$ofs: -50
    #    Dist     X    Y  deg ofs      Z newcol1  newcol2
    # 2 20.23 482.3 3578 4.77 -50 19.731 479.421 3528.083
    # --------------------------------------------------------------------------------------------- 
    #   df$ofs: -25
    #    Dist     X    Y  deg ofs      Z  newcol1  newcol2
    # 3 20.23 481.3 3578 4.77 -25 19.741 479.8605 3553.041
    # --------------------------------------------------------------------------------------------- 
    #   df$ofs: 0
    #    Dist     X    Y  deg ofs     Z newcol1 newcol2
    # 1 20.21 499.3 3577 4.77   0 19.75   499.3    3577
    # 4 20.23 480.3 3578 4.77   0 19.75   480.3    3578
    # --------------------------------------------------------------------------------------------- 
    #   df$ofs: 25
    #    Dist     X    Y  deg ofs      Z  newcol1  newcol2
    # 5 20.23 479.3 3578 4.77  25 19.749 480.7395 3602.959
    # --------------------------------------------------------------------------------------------- 
    #   df$ofs: 50
    #    Dist     X    Y  deg ofs     Z newcol1  newcol2
    # 6 20.24 478.3 3578 4.77  50 19.74 481.179 3627.917
    

    If you plan to reassemble it:

    do.call(rbind, by(df, df$ofs, fun))
    #      Dist     X    Y  deg ofs      Z  newcol1  newcol2
    # -50 20.23 482.3 3578 4.77 -50 19.731 479.4210 3528.083
    # -25 20.23 481.3 3578 4.77 -25 19.741 479.8605 3553.041
    # 0.1 20.21 499.3 3577 4.77   0 19.750 499.3000 3577.000
    # 0.4 20.23 480.3 3578 4.77   0 19.750 480.3000 3578.000
    # 25  20.23 479.3 3578 4.77  25 19.749 480.7395 3602.959
    # 50  20.24 478.3 3578 4.77  50 19.740 481.1790 3627.917