Search code examples
pythonstringlistsublist

python split string list into lists of strings using text in the list


I have a list named 'exemptions' with several fields (string variables).

exemptions = ['S-1', '20090820', '\t\t\t\tDOLLAR GENERAL CORP', '\t\t0000029534', 'S-1/A', '20021114', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806', '\t\t\t\tCONSTAR FOREIGN HOLDINGS INC', '\t\t0001178543', '\t\t\t\tCONSTAR PLASTICS LLC', '\t\t0001178541', '\t\t\t\tDT INC', '\t\t0001178539', '\t\t\t\tBFF INC', '\t\t0001178538', '\t\t\t\tCONSTAR INC', '\t\t0001178537', 'S-1', '20020523', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806', 'S-1', '20051123', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300', 'S-1', '20061221', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300', 'S-1/A', '20140327', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729', 'S-1', '20110331', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729', 'S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388', 'S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761', 'S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761', 'S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015', 'S-1', '20140512', '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI', '\t\t0000759944']

I would like to create sublists at the beginning of every 'S-1' or 'S-1/A'. Desired output would be:

exemptions = [['S-1', '20090820', '\t\t\t\tDOLLAR GENERAL CORP', '\t\t0000029534'], ['S-1/A', '20021114', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806', '\t\t\t\tCONSTAR FOREIGN HOLDINGS INC', '\t\t0001178543', '\t\t\t\tCONSTAR PLASTICS LLC', '\t\t0001178541', '\t\t\t\tDT INC', '\t\t0001178539', '\t\t\t\tBFF INC', '\t\t0001178538', '\t\t\t\tCONSTAR INC', '\t\t0001178537'], ['S-1', '20020523', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806'], ['S-1', '20051123', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'], ['S-1', '20061221', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'], ['S-1/A', '20140327', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'], ['S-1', '20110331', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'], ['S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388'], ['S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'], ['S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015'], ['S-1', '20140512', '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI', '\t\t0000759944']]

I tried _list = [i.split('S-1') for i in exemptions], but does not give me what I need...

Any suggestion? Thank you so much


Solution

  • Join the list as a string with custom delimiter, say | for example, use re.split to split on every occurrence of S-1 and then split each element of the resulting list back to a list based on delimiter |

    >>> res = [s.strip('|').split('|') for s in re.split(r'(?=S-1)', '|'.join(exemptions)) if s]
    >>>
    >>> pprint(res)
    [['S-1', '20090820', '\t\t\t\tDOLLAR GENERAL CORP', '\t\t0000029534'],
     ['S-1/A',
      '20021114',
      '\t\t\t\tCONSTAR INTERNATIONAL INC',
      '\t\t0000029806',
      '\t\t\t\tCONSTAR FOREIGN HOLDINGS INC',
      '\t\t0001178543',
      '\t\t\t\tCONSTAR PLASTICS LLC',
      '\t\t0001178541',
      '\t\t\t\tDT INC',
      '\t\t0001178539',
      '\t\t\t\tBFF INC',
      '\t\t0001178538',
      '\t\t\t\tCONSTAR INC',
      '\t\t0001178537'],
     ['S-1', '20020523', '\t\t\t\tCONSTAR INTERNATIONAL INC', '\t\t0000029806'],
     ['S-1', '20051123', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'],
     ['S-1', '20061221', '\t\t\t\tEXCO RESOURCES INC', '\t\t0000316300'],
     ['S-1/A', '20140327', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'],
     ['S-1', '20110331', '\t\t\t\tAlly Financial Inc.', '\t\t0000040729'],
     ['S-1', '20040319', '\t\t\t\tDIGIRAD CORP', '\t\t0000707388'],
     ['S-1', '20040408', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'],
     ['S-1', '20041027', '\t\t\t\tBUCYRUS INTERNATIONAL INC', '\t\t0000740761'],
     ['S-1', '20050630', '\t\t\t\tSEALY CORP', '\t\t0000748015'],
     ['S-1',
      '20140512',
      '\t\t\t\tCITIZENS FINANCIAL GROUP INC/RI',
      '\t\t0000759944']]
    >>>