Search code examples
rdataframefunctiongroup-byaggregate

How to use aggregate with count but also consider some of the NA values in R?


I have a dataframe - df1. I want to get it to df2 as shown below using R:

**df1**
Cust_id Cust_name       Cust_order
1       Andrew          coffee
2       Dillain         burger
3       Alma            coffee
4       Wesney          chips
5       Kiko            chips
NA      NA              fries
NA      NA              milkshake
NA      NA              sandwich
NA      NA              eggs

**df2**
Cust_order  freq
coffee      2
burger      1
chips       2
fries       0
milkshake   0
sandwich    0
eggs        0

I have used the aggregate count function to achieve this but it does not give me the result that I want. I want the orders with the NA values to give "0". Any help is appreciated.


Solution

  • You can use the formula-notation for aggregate to group by Cust_order and calculate a statistic on Cust_id. In this case, you want to count the non-NA values of Cust_id, which you can do with function(x) sum(!is.na(x)). We have to explicitly tell it to keep the NA values using the na.action argument.

    aggregate(Cust_id ~ Cust_order, df1, FUN = function(x) sum(!is.na(x)), na.action = na.pass)
    

    which gives

      Cust_order Cust_id
    1     burger       1
    2      chips       2
    3     coffee       2
    4       eggs       0
    5      fries       0
    6  milkshake       0
    7   sandwich       0