Search code examples
rplotdistributionswarmplotbeeswarm

R-package beeswarm generates same x-coordinates


I am working on a script where I need to calculate the coordinates for a beeswarm plot without immediately plotting. When I use beeswarm, I get x-coordinates that aren't swarmed, and more or less the same value: enter image description here

But if I generate the same plot again it swarms correctly: enter image description here

And if I use dev.off() I again get no swarming:

enter image description here

The code I used:

n <- 250
df = data.frame(x = floor(runif(n, 0, 5)),
                y = rnorm(n = n, mean = 500, sd = 100))

#Plot 1:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)

#Plot 2:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)

dev.off()

#Plot 3:
A = with(df, beeswarm(y ~ x, do.plot = F))
plot(x = A$x, y=A$y)

It seems to me like beeswarm uses something like the current plot parameters (or however it is called) to do the swarming and therefore chokes when a plot isn't showing. I have tried to play around with beeswarm parameters such as spacing, breaks, corral, corralWidth, priority, and xlim, but it does not make a difference. FYI: If do.plot is set to TRUE the x-coordinates are calculated correctly, but this is not helpful as I don't want to plot immediately.

Any tips or comments are greatly appreciated!


Solution

  • You're right; beeswarm uses the current plot parameters to calculate the amount of space to leave between points. It seems that setting "do.plot=FALSE" does not do what one would expect, and I'm not sure why I included this parameter.

    If you want to control the parameters manually, you could use the functions swarmx or swarmy instead. These functions must be applied to each group separately, e.g.

    dfsplitswarmed <- by(df, df$x, function(aa) swarmx(aa$x, aa$y, xsize = 0.075, ysize = 7.5, cex = 1, log = ""))
    
    dfswarmed <- do.call(rbind, dfsplitswarmed)
    
    plot(dfswarmed)
    

    In this case, I set the xsize and ysize values based on what the function would default to for this particular data set. If you can find a set of xsize/ysize values that work for your data, this approach might work for you.

    Otherwise, perhaps a simpler approach would be to leave do.plot=TRUE, and then discard the plots.