Search code examples
rcurve-fittingpercentilegamlss

Getting percentile values from gamlss centile curves


This question is related to: Selecting Percentile curves using gamlss::lms in R

I can get centile curve from following data and code:

age = sample(5:15, 500, replace=T) 
yvar = rnorm(500, age, 20)
mydata = data.frame(age, yvar)
head(mydata)
  age      yvar
1  12  13.12974
2  14 -18.97290
3  10  42.11045
4  12  27.89088
5  11  48.03861
6   5  24.68591

h = lms(yvar, age , data=mydata, n.cyc=30)
centiles(h,xvar=mydata$age, cent=c(90), points=FALSE)

enter image description here

How can I now get yvar on the curve for each of x value (5:15) which would represent 90th percentiles for data after smoothening?

I tried to read help pages and found fitted(h) and fv(h) to get fitted values for entire data. But how to get values for each age level at 90th centile curve level? Thanks for your help.

Edit: Following figure show what I need:

enter image description here

I tried following but it is correct since value are incorrect:

mydata$fitted = fitted(h)
aggregate(fitted~age, mydata, function(x) quantile(x,.9))
   age    fitted
1    5  6.459680
2    6  6.280579
3    7  6.290599
4    8  6.556999
5    9  7.048602
6   10  7.817276
7   11  8.931219
8   12 10.388048
9   13 12.138104
10  14 14.106250
11  15 16.125688

The values are very different from 90th quantile directly from data:

> aggregate(yvar~age, mydata, function(x) quantile(x,.9))
   age     yvar
1    5 39.22938
2    6 35.69294
3    7 25.40390
4    8 26.20388
5    9 29.07670
6   10 32.43151
7   11 24.96861
8   12 37.98292
9   13 28.28686
10  14 43.33678
11  15 44.46269

Solution

  • See if this makes sense. The 90th percentile of a normal distribution with mean and sd of 'smn' and 'ssd' is qnorm(.9, smn, ssd): So this seems to deliver (somewhat) sensible results, albeit not the full hack of centiles that I suggested:

     plot(h$xvar, qnorm(.9, fitted(h), h$sigma.fv))
    

    (Note the massive overplotting from only a few distinct xvars but 500 points. Ande you may want to set the ylim so that the full range can be appreciated.)

    enter image description here

    The caveat here is that you need to check the other parts of the model to see if it is really just an ordinary Normal model. In this case it seems to be:

    > h$mu.formula
    y ~ pb(x)
    <environment: 0x10275cfb8>
    > h$sigma.formula
    ~1
    <environment: 0x10275cfb8>
    > h$nu.formula
    NULL
    > h$tau.formula
    NULL
    

    So the model is just mean-estimate with a fixed-variance (the ~1) across the range of the xvar, and there are no complications from higher order parameters like a Box-Cox model. (And I'm unable to explain why this is not the same as the plotted centiles. For that you probably need to correspond with the package authors.)