Search code examples
rr-daisy

How to input ordinal data into daisy function


I have a data set with 12 variables each taking values 1 to 4 and are to be treated as ordinal. If I don't specify their type, they are being treated as interval type

> attributes(gower_dist)
$class
[1] "dissimilarity" "dist"         

$Size
[1] 5845

$Metric
[1] "mixed"

$Types
 [1] "I" "I" "I" "I" "I" "I" "I" "I" "I" "I" "I" "I"

but if I add 'type=list(ordratio=1:12)', the type becomes 'T' and I'm sure what that stands for. If it doesnt stand for ordinal, then how do I tell daisy that I am inputting ordinal data?

> attributes(gower_dist)
$class
[1] "dissimilarity" "dist"         

$Size
[1] 5845

$Metric
[1] "mixed"

$Types
 [1] "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T"

Solution

  • Short answer:

    If you specified ordinal ratios & observe the resulting type to be "T", that's the expected behaviour.

    Long answer:

    I took a look inside the daisy function. There are 6 possible values for the Types attribute:

    typeCodes <- c("A", "S", "N", "O", "I", "T")
    

    I cycled through the function in debug mode a couple of times with different parameters. The mapping appears to be as follows for this attribute:

    • If you specify type = list(asymm=<whichever columns in the dataset>): "A"

    • If you specify type = list(symm=<whichever columns in the dataset>): "S"

    • If you specify type = list(ordratio=<whichever columns in the dataset>): "T"

    If you don't specify type, or you specify type=list(logratio=<whichever columns in the dataset>), & your dataset's columns are:

    • factors: "N"

    • ordered: "O"

    • numeric / integers: "I"

    (Not sure why logratio doesn't get its own type, but that's probably going off topic here...)