Search code examples
pythonregexstringstrip

Split a string with whitespaces following a template


I have the following string header (template):

Port          Name               Status    Vlan      Duplex  Speed   Type

and the string str:

Eth1/2        trunk to dg-qwu-29 connected trunk     full    1000    1/10g

Using the header, how can I strip str to the following list ?

[Eth1/2, trunk to dg-qwu-29, connected, trunk, full, 1000, 1/10g]

Solution

  • The following assumes that the rows and headers follow a whitespace mask. That is, the header text are aligned with the row columns.

    import re
    header =  "Port          Name               Status    Vlan      Duplex  Speed   Type"
    row    =  "Eth1/2        trunk to dg-qwu-29 connected trunk     full    1000    1/10g"
    # retrieve indices where each header title begins and ends
    matches = [(m.group(0), (m.start(), m.end()-1)) for m in re.finditer(r'\S+', header)]
    b,c=zip(*matches)
    # each text in the row begins in each header title index and ends at most before the index 
    # of the next header title. strip() to remove extra spaces
    items = [(row[j[0]:(c[i+1][0] if i < len(c)-1 else len(row))]).strip() for i,j in enumerate(c)]
    print items
    

    The above outputs:

    ['Eth1/2', 'trunk to dg-qwu-29', 'connected', 'trunk', 'full', '1000', '1/10g']
    

    Edit: Index retrieval from https://stackoverflow.com/a/13734572/1847471