Search code examples
pythonlisttuplesunique

Getting all unqiue strings from a list of nested list and tuples


Is there a fast way to get the unique elements, especially the strings from a list or tuple of nested lists and tuples. Strings like 'min' and 'max' should be removed. The lists and tuples could be nested in any possible way. The only element which will always be the same are the tuples at the core like ('a',0,49), which contains the strings.

Like those list or tuple:

lst1=[[(('a',0,49),('b',0,70)),(('c',0,49))],
     [(('c',0,49),('e',0,70)),(('a',0,'max'),('b',0,100))]]

tuple1=([(('a',0,49),('b',0,70)),(('c',0,49))],
     [(('c',0,49),('e',0,70)),(('a',0,'max'),('b',0,100))]) 

Wanted Output:

uniquestrings = ['a','b','c','e']

What I tried so far:

flat_list = list(sum([item for sublist in x for item in sublist],()))

But this does not go to the "core" of the nested object


Solution

  • This will get any string inside the given iterable, regardless of position inside the iterable:

    def isIterable(obj):
        # cudos: https://stackoverflow.com/a/1952481/7505395
        try:
            _ = iter(obj)
            return True
        except:
            return False
    
    # shortcut
    isString = lambda x: isinstance(x,str)
    
    def chainme(iterab):
        # strings are iterable too, so skip those from chaining
        if isIterable(iterab) and not isString(iterab):
            for a in iterab:
                yield from chainme(a)
        else: 
            yield iterab
    
    lst1=[[(('a',0,49),('b',0,70)),(('c',0,49))],
         [(('c',0,49),('e',0,70)),(('a',0,'max'),('b',0,100))]]
    
    tuple1=([(('a',0,49),('b',0,70)),(('c',0,49))],
         [(('c',0,49),('e',0,70)),(('a',0,'max'),('b',0,100))]) 
    
    
    for k in [lst1,tuple1]:
        # use only strings
        l = [x for x in chainme(k) if isString(x)]
        print(l)
        print(sorted(set(l)))
        print()
    

    Output:

    ['a', 'b', 'c', 'c', 'e', 'a', 'max', 'b'] # list
    ['a', 'b', 'c', 'e', 'max']                # sorted set of list
    
    ['a', 'b', 'c', 'c', 'e', 'a', 'max', 'b']
    ['a', 'b', 'c', 'e', 'max']