Search code examples
pythonlistvenn-diagram

Venn Diagram up to 4 lists - outputting the intersections and unique sets


in my work I use a lot of Venn diagrams, and so far I've been relying on the web-based "Venny". This offers the nice option to export the various intersections (i.e., the elements belonging only to that specific intersection). Also, it does diagrams up to 4 lists.

Problem is, doing this with large lists (4K+ elements) and more than 3 sets is a chore (copy, paste, save...). Thus, I have decided to focus on generating the lists myself and use it just to plot.

This lengthy introduction leads to the crux of the matter. Given 3 or 4 lists which partially contain identical elements, how can I process them in Python to obtain the various sets (unique, common to 4, common to just first and second, etc...) as shown on the Venn diagram (3 list graphical example, 4 list graphical example)? It doesn't look too hard for 3 lists but for 4 it gets somewhat complex.


Solution

  • Assuming you have python 2.6 or better:

    >>> from itertools import combinations
    >>>
    >>> data = dict(
    ...   list1 = set(list("alphabet")),
    ...   list2 = set(list("fiddlesticks")),
    ...   list3 = set(list("geography")),
    ...   list4 = set(list("bovinespongiformencephalopathy")),
    ... )
    >>>
    >>> variations = {}
    >>> for i in range(len(data)):
    ...   for v in combinations(data.keys(),i+1):
    ...     vsets = [ data[x] for x in v ]
    ...     variations[tuple(sorted(v))] = reduce(lambda x,y: x.intersection(y), vsets)
    ...
    >>> for k,v in sorted(variations.items(),key=lambda x: (len(x[0]),x[0])):
    ...   print "%r\n\t%r" % (k,v)
    ...
    ('list1',)
            set(['a', 'b', 'e', 'h', 'l', 'p', 't'])
    ('list2',)
            set(['c', 'e', 'd', 'f', 'i', 'k', 'l', 's', 't'])
    ('list3',)
            set(['a', 'e', 'g', 'h', 'o', 'p', 'r', 'y'])
    ('list4',)
            set(['a', 'c', 'b', 'e', 'g', 'f', 'i', 'h', 'm', 'l', 'o', 'n', 'p', 's', 'r', 't', 'v', 'y'])
    ('list1', 'list2')
            set(['e', 'l', 't'])
    ('list1', 'list3')
            set(['a', 'h', 'e', 'p'])
    ('list1', 'list4')
            set(['a', 'b', 'e', 'h', 'l', 'p', 't'])
    ('list2', 'list3')
            set(['e'])
    ('list2', 'list4')
            set(['c', 'e', 'f', 'i', 'l', 's', 't'])
    ('list3', 'list4')
            set(['a', 'e', 'g', 'h', 'o', 'p', 'r', 'y'])
    ('list1', 'list2', 'list3')
            set(['e'])
    ('list1', 'list2', 'list4')
            set(['e', 'l', 't'])
    ('list1', 'list3', 'list4')
            set(['a', 'h', 'e', 'p'])
    ('list2', 'list3', 'list4')
            set(['e'])
    ('list1', 'list2', 'list3', 'list4')
            set(['e'])