Search code examples
pythonlistloopssublist

How could I group a list of strings into a list of different sublist


I have a list of strings as the example below.

list = ['#4008 (Pending update)',
 'Age 1 Female',
 'Onset date',
 '-',
 '#4007 (Pending update)',
 'Onset date',
 'Asymptomatic',
 'Confirmed date',
 '-',
 '+'
 '#4006 (Pending update)',
 'Age 65 Female',
 'Onset date',
 '-',
 'Place of residence',
 '-']

I am going to group the strings into a sublists of list as below, where if a string is starting with '#' then I would group it with the strings behind it until the next string that starts with '#' appear.

[['#4008 (Pending update)',
 'Age 1 Female',
 'Onset date',
 '-'],

 ['#4007 (Pending update)',
 'Onset date',
 'Asymptomatic',
 'Confirmed date',
 '-',
 '+'],

['#4006 (Pending update)',
 'Age 65 Female',
 'Onset date',
 '-',
 'Place of residence',
 '-']]
new_list = []
sub_list
n = 0
for i in list:
    if i[0].startswith('#'):
        try i[0+1].
        sub_list.append(i)

new_list.append(sub_list)
new_list

My idea is starting with the index 0 string and check strings one by one and break the loop when next string starts with # appear. Then the searching loop starts again to group the next sublist but I have no idea how to write the code now. How could it can be achived, thanks


Solution

  • lst = ['#4008 (Pending update)',
     'Age 1 Female',
     'Onset date',
     '-',
     '#4007 (Pending update)',
     'Onset date',
     'Asymptomatic',
     'Confirmed date',
     '-',
     '+',
     '#4006 (Pending update)',
     'Age 65 Female',
     'Onset date',
     '-',
     'Place of residence',
     '-']
    
    out = []
    for val in lst:
        if val.startswith('#'):
            out.append([val])
        else:
            out[-1].append(val)
    
    from pprint import pprint
    pprint(out, width=40)
    

    Prints:

    [['#4008 (Pending update)',
      'Age 1 Female',
      'Onset date',
      '-'],
     ['#4007 (Pending update)',
      'Onset date',
      'Asymptomatic',
      'Confirmed date',
      '-',
      '+'],
     ['#4006 (Pending update)',
      'Age 65 Female',
      'Onset date',
      '-',
      'Place of residence',
      '-']]