Search code examples
pythonlistsequencesequences

extract sequences from python list


I have a list in python which looks like this:

['x','x','x','x','P','x','x','N','P','N','x','x','x','N','P','x','x,'x,','x','x','x','N','x,'x','P','N','x','x','x'....]

I need to process the list in some way such that I return individual sequences of P and N. In the above case I need to return:

[['P'],['N','P','N'],['N','P'],['N'],['P','N'].....]

I have looked at itertools but have not found anything that can do this. I have a lot of lists to process in this way so efficiency is also important.


Solution

  • You can do it using itertools.groupby:

    from itertools import groupby
    
    data = ['x','x','x','x','P','x','x','N','P','N','x','x','x','N',
            'P','x','x','x','x','x','x','N','x','x','P','N','x','x','x']
    
    out = list(list(g) for k, g in groupby(data, lambda item: item in {'N', 'P'}) if k)
    
    print(out)
    # [['P'], ['N', 'P', 'N'], ['N', 'P'], ['N'], ['P', 'N']]
    

    We group according to item in {'N', 'P'}, and keep only the groups for which this is True.