I have the below df. They are frequency counts:
pnb3 <- structure(list(Likelihood.to.Click.Freq = c(29L, 71L, 120L),
Likelihood.to.Enroll.Freq = c(30L, 84L, 106L), Likelihood.to.Click.1.Freq = c(54L,
90L, 108L), Likelihood.to.Enroll.1.Freq = c(55L, 109L, 88L
), Likelihood.to.Click_0.Freq = c(50L, 77L, 86L), Likelihood.to.Enroll_0.Freq = c(49L,
93L, 71L), Likelihood.to.Click_1.Freq = c(25L, 63L, 163L),
Likelihood.to.Enroll._0.Freq = c(26L, 90L, 135L), Likelihood.to.Click_2.Freq = c(63L,
74L, 94L), Likelihood.to.Enroll_1.Freq = c(61L, 95L, 75L),
Likelihood.to.Click_3.Freq = c(22L, 51L, 157L), Likelihood.to.Enroll._1.Freq = c(24L,
93L, 113L), Likelihood.to.Click_4.Freq = c(42L, 66L, 118L
), Likelihood.to.Enroll._2.Freq = c(39L, 90L, 97L), Likelihood.to.Click_5.Freq = c(25L,
47L, 157L), Likelihood.to.Enroll_2.Freq = c(26L, 75L, 128L
), Likelihood.to.Click_6.Freq = c(42L, 84L, 96L), Likelihood.to.Enroll_3.Freq = c(38L,
103L, 81L), Likelihood.to.Click_7.Freq = c(30L, 69L, 105L
), Likelihood.to.Enroll_4.Freq = c(28L, 88L, 88L), Likelihood.to.Click_8.Freq = c(29L,
57L, 140L), Likelihood.to.Enroll_5.Freq = c(27L, 90L, 109L
), Likelihood.to.Click_9.Freq = c(40L, 70L, 109L), Likelihood.to.Enroll_6.Freq = c(34L,
94L, 91L), Likelihood.to.Click_10.Freq = c(31L, 75L, 135L
), Likelihood.to.Enroll_7.Freq = c(32L, 93L, 116L)), class = "data.frame", row.names = c(NA,
-3L))
but when I try to change the counts to %. The last row is incorrect. It should be ~54/55 percent. But I am getting ~47/48 percent. I dont think its a rounding error as its off by quite a bit. Basically in each set of outputs one number comes out incorrect.
Here is the code I use to change frequency counts to percentage. Is there anything wrong with it? I know theres ways to use a function but I wanted to break it down to see each step:
pnb4 <- pnb3 / (colSums(pnb3))
pnb5 <- pnb4 *100
pnb6 <- round(pnb5,1)
If you run it you'll notice the third % is off by quite a bit.
UPDATE: for example once I run the above the first output gives me this
but the third row should actually be 54% (because 120/220 = 54%)
The problem is that your code isn't vectorized in the way you want it to be. What your code does it takes the first value of column 1 and divides it by the colSum for column 1. Then it takes the second row for column 1 and divides it by the colSum for column 2 (which still is correct because both colsums are the same). But when you get to the third row, it divides by teh colsum for col 3 (i.e. 252) and that is not correct.
You can do:
library(dplyr)
pnb3 %>%
mutate(across(everything(), ~round(./sum(.)*100, 1)))
Here's the result for the first few columns:
# A tibble: 3 x 26
Likelihood.to.C~ Likelihood.to.E~ Likelihood.to.C~ Likelihood.to.E~
<dbl> <dbl> <dbl> <dbl>
1 13.2 13.6 21.4 21.8
2 32.3 38.2 35.7 43.3
3 54.5 48.2 42.9 34.9