I have a fairly simple statistical task that I'm having trouble with. I need to calculate the topic that has the greatest and least amount of unique instances. The problem is that the topic was not assigned the same number of times, so I think I need to figure out the number of times the topic referred to a unique instance (numUnique
) depending on the number of times the topic was coded overall (numCoded
).
The df looks like this:
topic | numCoded | numUnique |
---|---|---|
A | 63 | 52 |
B | 134 | 91 |
C | 19 | 16 |
D | 35 | 35 |
I tried to calculate the percent change between numCoded
, but I'm pretty sure that's not what I need to compute and it spits out NA
for the new column anyway:
library(tidyverse)
foo <- propAgree %>%
group_by(topic) %>%
mutate(pct_change = (numCoded/lag(numCoded) - 1) * 100)
The expected output would look something like this (NOTE: I'm using dummy percentages here because I don't know how to compute this)
| topic | similarity |
|---------------------|------------------|
| A | 30% |
| B | 50% |
| C | 70% |
| D | 20% |
I need to do this for the top and bottom 10 topics, so after calculating the similarity I would then filter for the top and bottom percentage values. Any help would be appreciated.
Try this code:
prop_Agree %>%
mutate(pct_change = (numUnique/numCoded) * 100)
It will calculate the percentages of numUnique in each topic Also, if you want them to be ordered, just add
%>% arrange(pct_change)
in the end and use head(10)
to extract the bottom 10 and tail(10)
to extract the top 10