Search code examples
pythonstringlistnlp

Check if strings are composed by list of substrings


Using python I am looking to print all words from a list that are entirely composed of smaller words in a seperated list. For example;

list1 = ('ABCDEFGHI', 'DEFABCGHI', 'ABCABCGHIABC', 'AACFFFGHI') 

list2 = ('ABC', 'DEF', 'GHI')

From these two lists I am trying to get the final output to print; ('ABCDEFGHI', 'DEFABCGHI', 'ABCABCGHIABC',) As these strings from list1 are made up entirely of shorter strings in list 2. But the string 'AACFFFGHI' should not be printed as it is not made up of a combination of these shorter strings.

So to try and clarify, the strings I am looking for from list1;

  • Should be fully composed of strings from list2
  • Can include multiple occurrences of strings from list 2
  • Does not need to use all the strings listed in list2

I've been struggling with this for a few days now and can search for strings made up of individual characters in a list, but I'm struggling to find the strings that are made up of sequences of characters. Any help would be much appreciated. Marcus.


Solution

  • The easiest way I can think of, is by obtaining all permutations of the strings in list2. So here's one way you could do this:

    • Obtain all permutations of the strings in list2
    • join them into a single string
    • Construct a set from the resulting iterable
    • Take the set.intersection with list1

    from itertools import permutations
    
    perms = set(map(''.join, permutations(list2, r=3)))
    perms.intersection(list1)
    # {'ABCDEFGHI', 'DEFABCGHI'}