Search code examples

pie chart of co-presence in clusters for about 10 factors in r

I've got a two-column dataset with about 30000 clusters and 10 factors like this:

cluster-1 Factor1
cluster-1 Factor2
cluster-2 Factor2
cluster-2 Factor3

And I would like to represent the co-occurrence of factors in the clusterset. Something like "Factor1+Factor3+Factor5 in 1234 clusters", and so on for the different combinations. I thought I could so something like a pie chart, but with 10 factors, I take there can be too many combinations.

What would be a good way of representing this?


  • There is one good programming question in here that should be addressed:

    How do I count the number of co-occurrences of factors in the different clusters?

    First simulate some data:

    n = 1000
    n.clusters = 100
    clusters = rep(1:n.clusters, length.out=n)
    n.factors = 10
    factors = round(rnorm(n, n.factors/2, n.factors/5))
    factors[factors > n.factors] = n.factors
    factors[factors < 1] = 1
    data = data.frame(cluster=clusters, factor=factors)
    > data
      cluster factor
    1       1      6
    2       2      6
    3       3      5
    4       4      4
    5       5      6
    6       6      1

    Then here is the code that could be used to tabulate the number of times each combination of factors occurs in the clusters:

    counts = with(data, table(tapply(factor, cluster, function(x) paste(as.character(sort(unique(x))), collapse=''))))

    This can be represented as a simple pie chart, for example,, height=5)

    enter image description here

    but simple counts like this are often most efficiently displayed as a sorted table. For more on this, check out Edward Tufte.