Search code examples
rsplit-apply-combine

split, apply and combine on 2 columns of data


I've got a dataframe consisting of a group and 2 value columns, as such:

group  val1  val2
    A     5     3
    A     2     4
    A     3     1
    B     3     6
    B     2     1
    B     0     2

I want to work out the number of rows where val1 > val2, split by subset. Initially I hardcoded this per subgroup with:

number_a <- nrow(subset(df, group=="A" & val1 > val2))
number_b <- nrow(subset(df, group=="B" & val1 > val2))

What's the proper way of automating this? I tried using the split() function but I couldn't work out how to pass in both val1 and val2 column.


Solution

  • Pretty straight forward using data.table

    If you want the number of rows

    library(data.table)
    setDT(df)[, .(RowsNum = sum(val1 > val2)), by = group]
    #    group RowsNum
    # 1:     A       2
    # 2:     B       1
    

    If you looking for split, apply combinations in base R, could also try

    sapply(split(df[-1], df[1]), function(x) sum(x[1] > x[2]))
    # A B 
    # 2 1 
    

    Or using tapply (also from base R)

    tapply(with(df, val1 > val2), df[1], sum)
    # group
    # A B 
    # 2 1 
    

    If you want the rows themselves

    setDT(df)[, .SD[val1 > val2]]
    #    group val1 val2
    # 1:     A    5    3
    # 2:     A    3    1
    # 3:     B    2    1
    

    Or very simple with base R too

    df[with(df, val1 > val2), ]
    #    group val1 val2
    # 1     A    5    3
    # 3     A    3    1
    # 5     B    2    1
    

    Or

    subset(df, val1 > val2)
    #   group val1 val2
    # 1     A    5    3
    # 3     A    3    1
    # 5     B    2    1