Search code examples
rgroup-bymicrosoft-rrevoscaler

Group by on XDF file?


Say I have a huge source XDF file generated with RevoScaleR. I want to create a new target XDF by grouping the source entries on columns A, B, C and compute the sum, min, max, avg, std deviation on column D.

Let's assume the target data is too big to fit into memory too. How should I proceed? I could not find much information about group by operations in the documentation.


Solution

  • The dplyrXdf package lets you carry out dplyr operations like this on Xdf files.

    library(dplyrXdf)
    src <- RxXdfData("src.xdf")
    dest <- src %>%
        group_by(A, B, C) %>%
        summarise(sum=sum(D), min=min(D), max=max(D), mean=mean(D), sd=sd(D))