Search code examples
pythonstringtime-complexitysubstringpython-itertools

Optimizing finding a string that matches the characters in a substring in any order?


Assuming a list as follows:

list_of_strings = ['foo', 'bar', 'soap', 'sseo', 'spaseo', 'oess']

and a sub string

to_find = 'seos'

I would like to find the string(s) in the list_of_strings that:

  1. Have the same length as to_find
  2. Have the same characters as to_find (irresepective of the order of the characters)

The output from the list_of_strings should be 'sseo', 'oess'] (since it has all the letters from to_find & all have a length of 4)

I have:

import itertools
list_of_strings = [string for string in list_of_strings if len(string) == len(to_find)]
result = [string for string in list_of_strings if any("".join(perm) in string for perm in itertools.permutations(to_find))]

To find how long does it take to run the code I did

import timeit
timeit.timeit("[string for string in list_of_strings if any(''.join(perm) in string for perm in itertools.permutations(to_find))]", 
              setup='from __main__ import list_of_strings, to_find', number=100000)

The process takes a while to give the output. I am guessing it is because of the use of itertools.permutations.

Is there a way I can make this code more efficient?

Thanks


Solution

  • If order doesn't matter, you can just sort the strings and compare the resulting lists:

    list_of_strings = ['foo', 'bar', 'soap', 'sseo', 'spaseo', 'oess']
    to_find = sorted('seos')
    matches = [word for word in list_of_strings if sorted(word) == to_find]