Search code examples
pythonmany-to-oneequivalence-classes

Many-to-one mapping (creating equivalence classes)


I have a project of converting one database to another. One of the original database columns defines the row's category. This column should be mapped to a new category in the new database.

For example, let's assume the original categories are:parrot, spam, cheese_shop, Cleese, Gilliam, Palin

Now that's a little verbose for me, And I want to have these rows categorized as sketch, actor - That is, define all the sketches and all the actors as two equivalence classes.

>>> monty={'parrot':'sketch', 'spam':'sketch', 'cheese_shop':'sketch', 
'Cleese':'actor', 'Gilliam':'actor', 'Palin':'actor'}
>>> monty
{'Gilliam': 'actor', 'Cleese': 'actor', 'parrot': 'sketch', 'spam': 'sketch', 
'Palin': 'actor', 'cheese_shop': 'sketch'}

That's quite awkward- I would prefer having something like:

monty={ ('parrot','spam','cheese_shop'): 'sketch', 
        ('Cleese', 'Gilliam', 'Palin') : 'actors'}

But this, of course, sets the entire tuple as a key:

>>> monty['parrot']

Traceback (most recent call last):
  File "<pyshell#29>", line 1, in <module>
    monty['parrot']
KeyError: 'parrot'

Any ideas how to create an elegant many-to-one dictionary in Python?


Solution

  • It seems to me that you have two concerns. First, how do you express your mapping originally, that is, how do you type the mapping into your new_mapping.py file. Second, how does the mapping work during the re-mapping process. There's no reason for these two representations to be the same.

    Start with the mapping you like:

    monty = { 
        ('parrot','spam','cheese_shop'): 'sketch', 
        ('Cleese', 'Gilliam', 'Palin') : 'actors',
    }
    

    then convert it into the mapping you need:

    working_monty = {}
    for k, v in monty.items():
        for key in k:
            working_monty[key] = v
    

    producing:

    {'Gilliam': 'actors', 'Cleese': 'actors', 'parrot': 'sketch', 'spam': 'sketch', 'Palin': 'actors', 'cheese_shop': 'sketch'}
    

    then use working_monty to do the work.