I want to compare strings without unecessary comparisons, so far I have:
[[[dice_coefficient(x,y) for x in ['a','a','b'][j]] for y in ['a','a','b'][0:j]] for j in [1,2]]
where dice_coefficient
is defined here
which gives the expected output but look like it would be the right approach if those strings were the comments of an author in a column of a pandas dataframe.
One thing is that in your case first iteration of your outer loop (for j ...
) is the subset of the second iteration. To compare things only once, you can do:
data = ['a', 'a', 'b']
[
[
dice_coefficient(x, y)
for x in data[i:]
]
for i, y in enumerate(data[:-1], start=1)
]
If you think that you'd have a lot of repetitive values in your data
you can use lru_cache
on dice_coefficient
to avoid repetitive comparisons.