Search code examples
pythonstringintenumerate

Python: Enumerate a list of string 'keys' into ints


I searched for a while but didn't find anything that explained exactly what I'm trying to do.

Basically I have a list of string "labels", e.g. ["brown", "black", "blue", "brown", "brown", "black"] etc. What I want to do is convert this into a list of integers where each label corresponds to an integer, so

["brown", "black", "blue", "brown", "brown", "black"]

becomes

[1, 2, 3, 1, 1, 2]

I looked into the enumerate function but when I gave it my list of strings (which is quite long), it assigned an int to each individual label, instead of giving the same label the same int:

[(1,"brown"),(2,"black"),(3,"blue"),(4,"brown"),(5,"brown"),(6,"black")]

I know how I could do this with a long and cumbersome for loop and if-else checks, but really I'm curious if there's a more elegant way to do this in only one or two lines.


Solution

  • You have non-unique labels; you can use a defaultdict to generate numbers on first access, combined with a counter:

    from collections import defaultdict
    from itertools import count
    from functools import partial
    
    label_to_number = defaultdict(partial(next, count(1)))
    [(label_to_number[label], label) for label in labels]
    

    This generates a count in order of the labels first occurrence in labels.

    Demo:

    >>> labels = ["brown", "black", "blue", "brown", "brown", "black"]
    >>> label_to_number = defaultdict(partial(next, count(1)))
    >>> [(label_to_number[label], label) for label in labels]
    [(1, 'brown'), (2, 'black'), (3, 'blue'), (1, 'brown'), (1, 'brown'), (2, 'black')]
    

    Because we are using a dictionary, the label-to-number lookups are constant cost, so the whole operation will take linear time based on the length of the labels list.

    Alternatively, use a set() to get unique values, then map these to a enumerate() count:

    label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
    [(label_to_number[label], label) for label in labels]
    

    This assigns numbers more arbitrarily, as set() objects are not ordered:

    >>> label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
    >>> [(label_to_number[label], label) for label in labels]
    [(2, 'brown'), (3, 'black'), (1, 'blue'), (2, 'brown'), (2, 'brown'), (3, 'black')]
    

    This requires looping through labels twice though.

    Neither approach requires you to first define a dictionary of labels; the mapping is created automatically.