Is there a way to provide sets or "order-insensitive tuples" as keys for a dict (or equivalent datstructure) in Python?

I'm looking for a pythonic way to map a given value to a tuple of keys in a data structure, so that datastructure[key1, key2] returns the same value as datastructure[key2, key1], without having to save the value twice for each mutation of the key tuple. Ideally, I'm looking for something like a dictionary with sets or "order-insensitive tuples" (I know, that tuples are by definition ordered) as keys.

My usecase for this is: I need to save similarity values of pairs of strings in an efficient way. Since sim(word1, word2) equals sim(word2, word1) (so ordering doesn't matter), I don't want to save the values for each tuple, because it would be redundant. Instead, I'd like to save one value for each word pair (the tuple key) and access it by either ordering of the words that the pair is made up.

Currently, I'm doing it like this and it's working fine:

d = {tuple(sorted(['w1', 'w2'])): 4}

print(d[tuple(sorted(['w1', 'w2']))])
print(d[tuple(sorted(['w2', 'w1']))])

>> 4
>> 4

But each update and access to the dictionary now requires a list initiation, sort action, and tuple transformation.

My question is therefore: Is there an elegant way to do, what I want, without having to use the above workaround or defining a custom class as key? Maybe there is a datastructure that allows exactly that and may even be faster because it is implemented on a lower level than my workaround. Ideally this would look something like this:

d = {('w1', 'w2'): 4}

print(d[('w1','w2')])
print(d[('w2','w1')])

>> 4
>> 4

Note: The above example doesn't work, because ('a2','b2') != ('b2','a2').

Here is the research I've done so far and an explanation of why these solutions aren't exactly what I'm looking for:

Defining a custom Class, where MyObject('a2','b2') equals MyObject('b2','a2') and using these custom objects works fine (Object of custom type as dictionary key) but is more complex than the workaroundabove.
Using sets, which are by definition position insensitive, as keys yields an error because sets are unhashable.
The following questions are related but all answers imply saving the same value multiple times for each tuple/key: How to create dict with 2 keys and one value? and Dictionary with multiple keys mapping to same value

I'm thankful for any hints and make excuses in advance, if a duplicate Stackoverflow question exists and I overlooked it - it's hard to search for this, since I don't know how the desired feature would be called.

Solution

You could use a helper function that creates sorted tuples in a cheaper way:

def key(s, t):
    return (s, t) if s < t else (t, s)

d = {key('w1', 'w2'): 4}

print(d[key('w1','w2')])
print(d[key('w2','w1')])

The function call overhead makes it slower than inlining the expression would be, but it's still faster than the alternatives mentioned so far.

Here are various expressions and their times for evaluating them and using them to look up the dict value:

 132.7 ± 1.8 ns  (s, t) if s < t else (t, s)
 171.7 ± 1.5 ns  key(s, t)
 236.5 ± 0.9 ns  frozenset({s, t})
 254.9 ± 2.5 ns  frozenset((s, t))
 273.3 ± 2.0 ns  frozenset([s, t])
 343.2 ± 3.6 ns  tuple(sorted([s, t]))
 767.0 ± 9.7 ns  T(s, t)

Benchmark code (Attempt This Online!):

from timeit import timeit
from statistics import mean, stdev

setup = '''
s, t = 'w1', 'w2'
sts = [(s, t), (t, s)] * 5000

def key(s, t):
    return (s, t) if s < t else (t, s)

# from Anentropic's answer
class T(tuple):
    def __new__(cls, *args):
        return super().__new__(cls, tuple(sorted(args)))

d = {KEY: 4}
'''

funcs = [
    "tuple(sorted([s, t]))",
    "frozenset([s, t])",
    "frozenset((s, t))",
    "frozenset({s, t})",
    "key(s, t)",
    "(s, t) if s < t else (t, s)",
    "T(s, t)",
]

times = {f: [] for f in funcs}
def stats(f):
    ts = [t * 1e9 for t in sorted(times[f])[:10]]
    return f'{mean(ts):6.1f} ± {stdev(ts):3.1f} ns '

for _ in range(100):
    for f in funcs:
        code = f'for s, t in sts: d[{f}]'
        setup_ = setup.replace('KEY', f)
        t = timeit(code, setup_, number=10**0) / 1e4
        times[f].append(t)

for f in sorted(funcs, key=stats):
    print(stats(f), f)