Search code examples
c++rrcpp

Applying Rcpp on a dataframe


I'm new to C++ and exploring faster computation possibilities on R through the Rcpp package. The actual dataframe contains over ~2 million rows, and is quite slow.

Existing Dataframes

Main Dataframe

df<-data.frame(z = c("a","b","c"), a = c(303,403,503), b = c(203,103,803), c = c(903,803,703))

Cost Dataframe

cost <- data.frame("103" = 4, "203" = 5, "303" = 6, "403" = 7, "503" = 8, "603" = 9, "703" = 10, "803" = 11, "903" = 12)

colnames(cost) <- c("103", "203", "303", "403", "503", "603", "703", "803", "903")

Steps

df contains z which is a categorical variable with levels a, b and c. I had done a merge operation from another dataframe to bring in a,b,c into df with the specific nos.

First step would be to match each row in z with the column names (a,b or c) and create a new column called 'type' and copy the corresponding number.

So the first row would read,

df$z[1] = "a"
df$type[1]= 303

Now it must match df$type with column names in another dataframe called 'cost' and create df$cost. The cost dataframe contains column names as numbers e.g. "103", "203" etc.

For our example, df$cost[1] = 6. It matches df$type[1] = 303 with cost$303[1]=6

Final Dataframe should look like this - Created a sample output

df1 <- data.frame(z = c("a","b","c"), type = c("303", "103", "703"), cost = c(6,4,10))

Solution

  • A possible solution, not very elegant but does the job:

    library(reshape2)
    
    tmp <- cbind(cost,melt(df)) # create a unique data frame
    
    row.idx <- which(tmp$z==tmp$variable) # row index of matching values
    col.val <- match(as.character(tmp$value[row.idx]), names(tmp) ) # find corresponding values in the column names
    # now put all together
    df2 <- data.frame('z'=unique(df$z),
                      'type' = tmp$value[row.idx],
                      'cost' =  as.numeric(tmp[1,col.val]) )
    

    the output:

    > df2 
      z type cost
    1 a  303    6
    2 b  103    4
    3 c  703   10
    

    see if it works