Search code examples
rcluster-analysis

DTWCLUST Shape Based Cluster Analysis in R: NA/NaN/Inf in Foreign Function Call Despite Complete Dataset


we are currently trying to run a shape based with the dtwclust package and are running in the following problem: For certain subsets of our data receive this error message:

Error in stats::hclust(stats::as.dist(distmat), method, members = dots$members) : NA/NaN/Inf in foreign function call (arg 11)

At first, we thought we might have missing data in our dataframe. However, we tested for NAs, NaNs, Infs and datatype (numeric) and everything turns out okay.

To make it even weirder - it seems to work when subsetting the data into chunks of around 1.5k rows in size. Other variables work just fine.

We cannot find any consistent patterns and do not seem to come closer to a solution - and would greatly appreciate your expertise and help.

To make the error reproducible, please find the code and complete dataset attached.

Code:

require(dtwclust)
hc_anger_sbd_k10 <- tsclust(anger, type = "h", k = 10L, preproc = zscore, seed = 100, distance = "sbd", centroid = shape_extraction, control = hierarchical_control(method="average"))

Data: Dropbox Link To Data

Thanks so much and kind regards


Solution

  • You have empty series, i.e. series whose values are all zero. For example anger[1949,]. According to the definition of SBD, the distance between such series and any other is infinite.

    You'll probably have to remove them with something like anger[rowSums(anger) != 0,].