Search code examples
rstatisticssplinesmoothingcross-validation

What is mean by 'cross-validation with non-unique 'x' values seems doubtful' in Smoothing spline in R?


I am using smoothing spline function in R but I get a warning message saying:

  > boneMaleSmooth = smooth.spline( bone[males,"age"], bone[males,"spnbmd"], cv=TRUE)
    Warning message:
    In smooth.spline(bone[males, "age"], bone[males, "spnbmd"], cv = TRUE) :
    cross-validation with non-unique 'x' values seems doubtful

  > boneFemaleSmooth = smooth.spline( bone[females,"age"], bone[females,"spnbmd"], cv=TRUE)
    Warning message:
    In smooth.spline(bone[females, "age"], bone[females, "spnbmd"],  :
    cross-validation with non-unique 'x' values seems doubtful

I read it somewhere saying it does not matter much. But I am not sure what caused it. Hope someone can help me.


Solution

  • This means you have some data points that share an x-value. It shouldn't make too much of a difference to the practical result you get in most cases. However, using cv=F is a better way to do things with such data.

    The reason behind it is in the smooth.spline code, it works out the total number of x-values:

    n <- length(x)
    

    And then it starts to process your data (I've included comments):

    xx <- round((x - mean(x))/tol) #normalise the data
    nd <- !duplicated(xx) #get rid of duplicated x values! THIS IS PART OF THE WARNING
    ux <- sort(x[nd]) #Sort the data
    nx <- length(ux) $Get the length of processed data to work with
    

    Then, later, the warning is thrown if nx < n:

    if (CV && nx < n) #CV is based on the input parameter `cv`
            warning("cross-validation with non-unique 'x' values seems doubtful")