I am working on a continuous distribution for which I need to test for normality. As part of the process, I am creating buckets in order to create categories. I need to test if my data is Normal with a mean of 24.9 and sd of 7.5.
I need to test normallity for the following range of value range <- c('<8', '12', '16', '20', '24', '28', '32', '36', '40', '44', '>44')
I order to find the observed value I need to perform the following computation in R in order to get the value against the Normal distribution.
obs <- c()
# total number of observed value = 62
obs <- append(obs, pnorm(8, 24.9, 7.5) * 62) # for bucket <8
obs <- append(obs, (pnorm(12, 24.9, 7.5) - pnorm(8, 24.9, 7.5)) * 62) # for bucket 12
# ...
# for bucket 16
# for bucket 20 etc.
Is there a way to make this logic vectorized such that I don't need to make a formula for every bucket?
Here's an idea based on taking the diff
- I'm not sure what the 3rd range would look like but this is always p_norm[i] - p_norm[i-1]
:
range_x <- c(8,12,16,20,24,28,32,36,40,44,100)
p_norm <- pnorm(range_x, 24.9, 7.5)
c(p_norm[1], diff(p_norm))*62
[1] 0.7513823 1.8970234 4.6477273 8.6236507 12.1191940
[6] 12.9007877 10.4021663 6.3529978 2.9386039 1.0293193
[11] 0.3371475