Search code examples
rdplyrtime-seriesdtw

Time Series Clustering: problem to converting a dplyr data frame into a list of time series


I'd like to use time series clustering using the dtwclust package. The problem is the conversion of my data.frame to list of time series. All my blocks ID (named STAND) has 180 days in negative values (DATE_TIME) The B2_MAX is my variable response. In my example:

library(dplyr)
library(ggplot2)
library(dtwclust)

all.B2_MAX.stands <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_ts_data.csv")

all.B2_MAX.tsc <-  all.B2_MAX %>%
  group_by(STAND) %>%
  summarise(var = list(B2_MAX[order(DATE_TIME)]), 
            var_ts = purrr::map(var, ts))

clusters <- tsclust(all.B2_MAX.tsc[-1], 
                   type="partitional", 
                   k=2L, 
                   distance="dtw",
                   centroid = "pam")

#plot
plot(cluster, type = "sc")

#Error in lapply(series, base::as.numeric) : 
#  'list' object cannot be coerced to type 'double'

Please, any help with it?


Solution

  • In this case split by response variable and idBlocks after using the tsclust function, work very well:

    d <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_ts_data.csv")
    l <- split(d$B2_MAX,d$STAND)
    o <- tsclust(l, 
            type="partitional", 
            k=2L, 
            distance="dtw_basic",
            centroid = "pam")
    #plot
    plot(o)
    o
    
    # partitional clustering with 2 clusters
    # Using dtw_basic distance
    # Using pam centroids
    
    # Time required for analysis:
    #   usuário   sistema decorrido 
    #      1.13      0.00      0.16 
    
    # Cluster sizes with average intra-cluster distance:
    
    #   size       av_dist
    # 1   14 3.518299e+198
    # 2   50  4.526561e+08