Search code examples
rerror-handlingnanparallel-coordinates

NaNs produced in scale transform


I am trying to create a ggparcoord plot with the y-value logged of some data that has positive and negative values in it:

x = data.frame(a=2*runif(100)-1,b=2*runif(100)-1,c=2*runif(100)-1,d=2*runif(100)-1,e=2*runif(100)-1)
dim(x)
[1] 100   5

I then try to plot the parallel coordinates plot:

library(GGally)
ggparcoord(x, columns=1:5, alphaLines=0.5) + scale_y_log10()

And receive the following error:

Warning messages:
1: In scale$trans$trans(x) : NaNs produced
2: Removed 167 rows containing missing values (geom_path).

I am thinking the NaNs are produced when we take a log of a negative value. However, I do not understand why 167 rows containing missing values, when the dimension of x was 100 rows.

In any case, I try to solve this by simply adding a value of 2 to every index in x (so that values in x are now between +1 and +3).

x=x+2
ggparcoord(x, columns=1:5, alphaLines=0.5) + scale_y_log10()
Warning messages:
  1: In scale$trans$trans(x) : NaNs produced
2: Removed 167 rows containing missing values (geom_path).

However, I receive the same message. Any idea how to solve this?


Solution

  • The ggparcoord function by default has parameter scale="std", which subtracts by the mean and divides by the standard deviation for each variable. This is a natural default, because you're trying to plot a bunch of different variables that might have very different scales on the same y-axis. Unfortunately for your application, this means that adding 2 to x will be reversed by the scaling and the negative values will remain.

    The approach to solve this issue would be to remove scaling:

    ggparcoord(x, columns=1:5, scale="globalminmax") + scale_y_log10(breaks=c(1, 2))
    

    enter image description here