Search code examples
rstatisticstidyversepercentagemutate

R calculate percentage of unique instances over total sum of different variable


I have a fairly simple statistical task that I'm having trouble with. I need to calculate the topic that has the greatest and least amount of unique instances. The problem is that the topic was not assigned the same number of times, so I think I need to figure out the number of times the topic referred to a unique instance (numUnique) depending on the number of times the topic was coded overall (numCoded).

The df looks like this:

topic numCoded numUnique
A 63 52
B 134 91
C 19 16
D 35 35

I tried to calculate the percent change between numCoded, but I'm pretty sure that's not what I need to compute and it spits out NA for the new column anyway:

library(tidyverse)
foo <- propAgree %>%
  group_by(topic) %>%
  mutate(pct_change = (numCoded/lag(numCoded) - 1) * 100)

The expected output would look something like this (NOTE: I'm using dummy percentages here because I don't know how to compute this)

|      topic     |     similarity    |    
|---------------------|------------------|
|          A         |        30%       |   
|          B         |         50%       |        
|          C        |          70%       |    
|          D         |         20%      |   

I need to do this for the top and bottom 10 topics, so after calculating the similarity I would then filter for the top and bottom percentage values. Any help would be appreciated.


Solution

  • Try this code:

    prop_Agree %>%
      mutate(pct_change = (numUnique/numCoded) * 100)
    
    

    It will calculate the percentages of numUnique in each topic Also, if you want them to be ordered, just add

    %>% arrange(pct_change)
    

    in the end and use head(10) to extract the bottom 10 and tail(10) to extract the top 10