Search code examples
rplottime-seriesk-means

Different color in the same price plot


I have to color the price plot differently based on the cluster given by the kmeans function. Consider this code:

library(tseries)

####

nomiequity <- "^IXIC" 
datastart  <- "2019-09-18"

nsdq.prices <- get.hist.quote(instrument  = nomiequity,    
                              compression = "d",         
                              start       = datastart, 
                              end         = "2020-12-31", 
                              retclass    = "zoo",
                              quote       = "AdjClose")

b<-kmeans(nsdq.prices,3)
c<-b$cluster
d<- merge(nsdq.prices, c)
e<-split(nsdq.prices, c)

plot(nsdq.prices, type="l", col="green", ylim=c(6000, 13000))
    lines(e[["2"]], type = "l", col="red")
    lines(e[["3"]], type = "l", col="blue")

enter image description here The result is almost what I need to do, but I don't want to show those connection between same colors in diffent time.


Solution

  • The problem is that the clusters get merged in the lines. You could use the rle lengths to increase the number by one when the clusters change in the time series. For this use Map to repeat consecutive numbers l times. Then you are able to split on these growing numbers but use cluster to define the color of the lines. For the latter use lapply to loop over the splitted e.

    cl <- kmeans(nsdq.prices, 3)$cluster
    l <- rle(as.numeric(cl))$lengths
    s <- Map(rep, seq(l), l)
    e <- split(cbind(nsdq.prices, cl), unlist(s))
    
    plot(nsdq.prices, type="l", col=7, ylim=c(6000, 13000))
    invisible(lapply(e, function(x) lines(x$Adjusted, col=x$cl + 1)))
    legend("topleft", leg=c(sprintf("cl %s", 1:3), "missing"), col=c((1:3)+1, 7), lty=1)
    

    enter image description here

    Where there's no date defined there appear gaps. We could use the zoo interpolations by overplotting the original plot using a "missing" color.