Search code examples
rcontrolslogistic-regressionrocproc-r-package

roc() function in pROC package: usage of controls and cases with its context


Can someone explain what the controls and cases arguments mean in the roc() function from the pROC package in R, and how to use them? How to check the number of controls and cases available in the dataset?


Solution

  • From help(roc):

    controls, cases instead of response, predictor, the data can be supplied as two numeric or ordered vectors containing the predictor values for control and case observations.

    Usually the roc curve is used in classificaiton settings, where you have two vector of labeled classes (factor() in R), one is your predicted labels, and the other is the truth, again each obs is labeled.

    Other times you can have a control group (like in medicine scenarios), and you can give the function either controls (a numeric vector) or cases (a factor vector).

    The control group is basically the part of population where you don't give the treatment.

    Again from the help function:

    Data can be provided as response, predictor, where the predictor is the numeric (or ordered) level of the evaluated signal, and the response encodes the observation class (control or case). The level argument specifies which response level must be taken as controls (first value of level) or cases (second). It can safely be ignored when the response is encoded as 0 and 1, but it will frequently fail otherwise. By default, the first two values of levels(as.factor(response)) are taken, and the remaining levels are ignored. This means that if your response is coded “control” and “case”, the levels will be inverted.

    In some cases, it is more convenient to pass the data as controls, cases, but both arguments are ignored if response, predictor was specified to non-NULL values. It is also possible to pass density data with density.controls, density.cases, which will result in a smoothed ROC curve even if smooth=FALSE, but are ignored if response, predictor or controls, cases are provided.

    data(aSAH)
    # With numeric controls/cases
    roc(controls=aSAH$s100b[aSAH$outcome=="Good"], cases=aSAH$s100b[aSAH$outcome=="Poor"])
    # With ordered controls/cases
    roc(controls=aSAH$wfns[aSAH$outcome=="Good"], cases=aSAH$wfns[aSAH$outcome=="Poor"])
    

    roc() documentation