Let's have for example the following data.
h: [Num1, Num2, Num3, Num4, Num5, Num6]
a: [1, 2, 3, 4, 5, 6]
b: [1, 2, 7, 8, 9, 10]
c: [1, 2, 3, 6, 8, 10]
Now, let's say I want to see combinations of 2+ ordered by frequency.
Let's take number:1 for example, it appears in all our three rows a, b, c.
When 1 is "used", it's usually paired with 2 (3/3), followed by 3, 6, 8, 10 (2/3). In other words, when 1 is "used" there is a chance it looks something like this:
[1, 2, x, y, z, t]
[1, 2, 3, x, y, z]
[1, 2, 6, x, y, z]
.
.
.
[1, 8, x, y, z, t]
[1, 10, x, y, z, t]
[1, 2, 3, 6, 8, 10]
Order does not matter. x, y, z, t could be any given number. Duplicates are not present/allowed.
I have a data frame with this format and want to see what other integers come in combination with, for example, 44.
For example:
44 was paired with 11, 350 times out of 2000
44 was paired with 27, 290 times out of 2000
44 was paired with 35, 180 times out of 2000
.
.
.
44 was paired with 2, 5 times out of 2000
I have the frequency of which every number occurs in each column, I just can't figure out how to continue this.
Looking forward to ideas and questions. Thank you!
You could use Counter from the itertools module
from itertools import combinations
from collections import Counter
data = [[1, 2, 3],[1, 2, 5],[1, 3, 8],[2, 5, 8]]
pairings = Counter(
pair for row in data
for pair in combinations(sorted(row), 2)
)
The Counter object is dictionary like.
Counter({
(1, 2): 2,
(1, 3): 2,
(2, 5): 2,
(2, 3): 1,
(1, 5): 1,
(1, 8): 1,
(3, 8): 1,
(2, 8): 1,
(5, 8): 1
})
You can get the count of a specific pair like this:
>>> pairings[1,2]
2