Search code examples
rdata-analysis

Find the best medical pathway


From hospital data, I know what all are various procedures done for a particular treatment by different doctors and for different patient demographics. Now I want to analyze these various paths and understand which is the best in terms of cost. When I say best it doesn't mean the one with minimum cost is best. Should find out the path followed by majority doctors and out of which is the least costly. My data is:

Doctor Procedure1 Procedure2 Procedure3 Procedure4 Procedure5 Charge
   111          1          2          3          4          5    200
   222          1          4          7          4          9    185
   333          2          3          5          1          9    250
   444          1          2          3          4          6    210
   222          1          2          3          4          6    210

I want to know of all these paths which one is the best.


Solution

  • For each path dd shows its Count (i.e. the number of rows having that path). dd is sorted in descending order of Count and ascending order of Charge. At the end we show the least expensive path(s) for each Count sorted in descending order of Count.

    dd <- aggregate(list(Count = 1:nrow(DF)), DF[-1], length)
    dd <- dd[order(-dd$Count, dd$Charge), ]
    dd2 <- dd[ave(dd$Charge, dd$Count, FUN = function(x) x == x[1]) == 1, ]
    

    giving:

    > dd2
      Procedure1 Procedure2 Procedure3 Procedure4 Procedure5 Charge Count
    3          1          2          3          4          6    210     2
    1          1          4          7          4          9    185     1
    

    That is, among those paths used twice 12346 is the least costly with a Charge of 210 and among those paths used once 14749 is the least costly with a Charge of 185. You can now assess the trade-off between Count and Charge. (To look at the counts for all paths look at dd contains one row per path with its Count sorted by Count and Charge.)

    One other thing you could do would be to remove dominated rows. That is if any row has higher Count and lower Charge than the current row then we can remove the current row. In this example there are no dominated rows but in case there could be this would remove them:

    is_dom <- function(r, DF) with(DF, any(Count[-r] > Count[r] & Charge[-r] < Charge[r]))
    dominated <- sapply(1:nrow(dd2), is_dom, dd2)
    dd3 <- dd2[!dominated, ]
    

    Note: The input in reproducible form is:

    DF <-
    structure(list(Doctor = c(111L, 222L, 333L, 444L, 222L), Procedure1 = c(1L, 
    1L, 2L, 1L, 1L), Procedure2 = c(2L, 4L, 3L, 2L, 2L), Procedure3 = c(3L, 
    7L, 5L, 3L, 3L), Procedure4 = c(4L, 4L, 1L, 4L, 4L), Procedure5 = c(5L, 
    9L, 9L, 6L, 6L), Charge = c(200L, 185L, 250L, 210L, 210L)), .Names = c("Doctor", 
    "Procedure1", "Procedure2", "Procedure3", "Procedure4", "Procedure5", 
    "Charge"), class = "data.frame", row.names = c(NA, -5L))
    

    Update: Simplify.