Search code examples
rmass

Can I assign colors to MASS::parcoord() based on a logical condition?


Here's the code to generate a parallel coordinate plot:

require(MASS)
shoes <- data.frame(shoes)
parcoord(shoes)

The shoes data set is used to show the power of a paired t-test, which is just background info. There are two columns in shoes, A and B, which represent wear from two sole materials. Analyzed correctly, there is a tremendous difference between the materials.

A good way to show paired data is with the parallel coordinate plot, but as you can see it's pretty much nothing without some color. I'd like to add two colors, say, red when A > B and green when A < B. Both situations occur:

> shoes$A > shoes$B
 [1] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE

My problem is that parcoord() cycles through colors as it goes through observations, so I'm not sure how to specify the color based on the logical test. I've tried

parcoord(shoes, col = ifelse(shoes$A > shoes$B, "red", "green"))

and various playing around with numbers (lots aside from just adding 26) in

my_colors <- colors()[as.numeric(shoes$A > shoes$B) + 26]
parcoord(shoes, col = my_colors)

but nothing seems to work. I either get a spectrum of colors, all one color, or all one color except for the top and bottom entries. I'd like the FALSE to generate one color, TRUE to generate another.


Solution

  • I'm not sure if I'm getting this straight, but your condition A > B is only true for the maximum and minimum of shoes.

    shoes <- within(shoes, criterium <- ifelse(A > B, "bigger", "smaller"))
    
           A    B criterium
    1  13.2 14.0   smaller
    2   8.2  8.8   smaller
    3  10.9 11.2   smaller
    4  14.3 14.2    bigger
    5  10.7 11.8   smaller
    6   6.6  6.4    bigger
    7   9.5  9.8   smaller
    8  10.8 11.3   smaller
    9   8.8  9.3   smaller
    10 13.3 13.6   smaller
    
    minmax <- c(min(min(shoes$A), min(shoes$B)), max(max(shoes$A), max(shoes$B)))
    
    > minmax
    [1]  6.4 14.3
    
    

    So your parallel coordinates plot will show only the top and bottom entries in "red". In other words: Your solution is correct.