Search code examples
pythonpandasfrequency

Python - How to check the combination of numbers by frequency


Let's have for example the following data.

 h: [Num1, Num2, Num3, Num4, Num5, Num6]
 a: [1,       2,    3,    4,    5,    6]
 b: [1,       2,    7,    8,    9,   10]
 c: [1,       2,    3,    6,    8,   10]

Now, let's say I want to see combinations of 2+ ordered by frequency.

Let's take number:1 for example, it appears in all our three rows a, b, c.

When 1 is "used", it's usually paired with 2 (3/3), followed by 3, 6, 8, 10 (2/3). In other words, when 1 is "used" there is a chance it looks something like this:

 [1, 2, x, y, z, t]
 [1, 2, 3, x, y, z]
 [1, 2, 6, x, y, z]
 .
 .
 .
 [1, 8, x, y, z, t]
 [1, 10, x, y, z, t]
 [1, 2, 3, 6, 8, 10]

Order does not matter. x, y, z, t could be any given number. Duplicates are not present/allowed.

I have a data frame with this format and want to see what other integers come in combination with, for example, 44.

For example:

 44 was paired with 11, 350 times out of 2000
 44 was paired with 27, 290 times out of 2000
 44 was paired with 35, 180 times out of 2000
 .
 .
 .
 44 was paired with 2, 5 times out of 2000

I have the frequency of which every number occurs in each column, I just can't figure out how to continue this.

Looking forward to ideas and questions. Thank you!


Solution

  • You could use Counter from the itertools module

    from itertools import combinations
    from collections import Counter
    data = [[1, 2, 3],[1, 2, 5],[1, 3, 8],[2, 5, 8]]
    pairings = Counter(
        pair for row in data 
        for pair in combinations(sorted(row), 2)
    )
    

    The Counter object is dictionary like.

    Counter({
        (1, 2): 2, 
        (1, 3): 2, 
        (2, 5): 2, 
        (2, 3): 1, 
        (1, 5): 1, 
        (1, 8): 1, 
        (3, 8): 1, 
        (2, 8): 1, 
        (5, 8): 1
    })
    

    You can get the count of a specific pair like this:

    >>> pairings[1,2] 
    2