Search code examples
pythonarrays

Efficient way of re-numbering elements in an array


I am reasonably new to python and am trying to implement a genetic algorithm, but need some assistance with the code for one of the operations.

I have formulated the problem this way:

  • each individual I is represented by a string of M integers
  • each element e in I takes a value from 0 to N
  • every number from 0 - N must appear in I at least once
  • the value of e is not important, so long as each uniquely valued element takes the same unique value (think of them as class labels)
  • e is less than or equal to N
  • N can be different for each I

after applying the crossover operation i can potentially generate children which violate one or more of these constraints, so i need to find a way to re-number the elements so that they retain their properties, but fit with the constraints.

for example:

parent_1 (N=5): [1 3 5 4 2 1|0 0 5 2]
parent_2 (N=3): [2 0 1 3 0 1|0 2 1 3]

*** crossover applied at "|" ***

child_1: [1 3 5 4 2 1 0 2 1 3]
child_2: [2 0 1 3 0 1 0 0 5 2]

child_1 obviously still satisfies all of the constraints, as N = 5 and all values 0-5 appear at least once in the array.

The problem lies with child 2 - if we use the max(child_2) way of calculating N we get a value of 5, but if we count the number of unique values then N = 4, which is what the value for N should be. What I am asking (in a very long winded way, granted) is what is a good, pythonic way of doing this:

child_2: [2 0 1 3 0 1 0 0 5 2]
*** some python magic ***
child_2':  [2 0 1 3 0 1 0 0 4 2]
*or*
child_2'': [0 1 2 3 1 2 1 1 4 0]

child_2'' is there to illustrate that the values themselves dont matter, so long as each element of a unique value maps to the same value, the constraints are satisfied.

here is what i have tried so far:

value_map = []
for el in child:
    if el not in value_map:
        value_map.append(el)

for ii in range(0,len(child)):
    child[ii] = value_map.index(child[ii])

this approach works and returns a result similar to child_2'', but i can't imagine that it is very efficient in the way it iterates over the string twice, so i was wondering if anyone has any suggestions of how to make it better.

thanks, and sorry for such a long post for such a simple question!


Solution

  • You will need to iterates the list more than once, I don't think there's any way around this. After all, you first have to determine the number of different elements (first pass) before you can start changing elements (second pass). Note, however, that depending on the number of different elements you might have up to O(n^2) due to the repetitive calls to index and not in, which have O(n) on a list.

    Alternatively, you could use a dict instead of a list for your value_map. A dictionary has much faster lookup than a list, so this way, the complexity should indeed be on the order of O(n). You can do this using (1) a dictionary comprehension to determine the mapping of old to new values, and (2) a list comprehension for creating the updated child.

    value_map = {el: i for i, el in enumerate(set(child))}
    child2 = [value_map[el] for el in child]
    

    Or change the child in-place using a for loop.

    for i, el in enumerate(child):
        child[i] = value_map[el]