Search code examples
rdataframedplyrsummarize

Is it possible to add an exception to summarize(count = n_distinct(x)) in R?


Is it possible to add an exception to summarize(count = n_distinct(x)) in R, while allowing the exception to be counted by the "normal" summarize(count = n()) function?

How do you combine the count n() and n_distinct() functions to create a single new column?

This way, I could summarize the distinct count of observations in column x, while adding an exception in the form of an observation, which would not be limited to a distinct count, but rather be subject to the "normal" summarize(count = n()) function.

For example, if x = c(1, 2, 2, 4, 5, 8, 8, ..., 99), I could summarize the distinct counts of all observations except, say, the observation 8 in column x. The observation 8 would instead be subject to the summarize(count = n()) function. This would then count the number of 8's plus the number of other unique values in x.

In conclusion, this would create a single new column "count", in which all values would be from the distinct count, except for the one exception, whose value would come from the "normal" count.


Solution

  • An update for future readers:

    If you want to combine both the distinct count and the "normal" count function, this will distinctly count all observations in x, except for observation 8, which will be subject to the "normal" count:

    summarize(count = n_distinct(x[x != 8]) + sum(x == 8))
    

    This would then count the number of 8's plus the number of other unique values in x.

    However, if you instead want to use the distinct count function, while adding an exception (e.g. 8), which shouldn't be counted at all, write this:

    n_distinct(x[x != 8])
    

    Or this

    ... %>% filter(x != 8) %>% summarize...