I am trying to create a ggparcoord plot with the y-value logged of some data that has positive and negative values in it:
x = data.frame(a=2*runif(100)-1,b=2*runif(100)-1,c=2*runif(100)-1,d=2*runif(100)-1,e=2*runif(100)-1)
dim(x)
[1] 100 5
I then try to plot the parallel coordinates plot:
library(GGally)
ggparcoord(x, columns=1:5, alphaLines=0.5) + scale_y_log10()
And receive the following error:
Warning messages:
1: In scale$trans$trans(x) : NaNs produced
2: Removed 167 rows containing missing values (geom_path).
I am thinking the NaNs are produced when we take a log of a negative value. However, I do not understand why 167 rows containing missing values, when the dimension of x was 100 rows.
In any case, I try to solve this by simply adding a value of 2 to every index in x (so that values in x are now between +1 and +3).
x=x+2
ggparcoord(x, columns=1:5, alphaLines=0.5) + scale_y_log10()
Warning messages:
1: In scale$trans$trans(x) : NaNs produced
2: Removed 167 rows containing missing values (geom_path).
However, I receive the same message. Any idea how to solve this?
The ggparcoord
function by default has parameter scale="std"
, which subtracts by the mean and divides by the standard deviation for each variable. This is a natural default, because you're trying to plot a bunch of different variables that might have very different scales on the same y-axis. Unfortunately for your application, this means that adding 2 to x
will be reversed by the scaling and the negative values will remain.
The approach to solve this issue would be to remove scaling:
ggparcoord(x, columns=1:5, scale="globalminmax") + scale_y_log10(breaks=c(1, 2))