Search code examples
pythonpandasregexfor-looppython-re

Manipulating and grouping strings within a for loop


Hi I would like to add a bit to the code where it splits the notes into two parts configurations and parameters. The configurations resides inside the [] of the notes and is to the left of the curly brackets (). The parameters however resides inside of the curly brackets (). For the notes that have parameters I want to split them up using a commas. If a parameter has one or more configurations a list that contains all elements of the config is separated by commas [element 1, element 2]. For parameters without any configs create and empty list []. If a note has no parameters then both the parameter and configuration section will be of type None. I want to achieve the results from the Expected Outputs below.

Code:

import re
import pandas as pd

lines = ['yes hello there', 'move on to the next command if the previous command was successful.',
         "$$n:describes the '&&' character in the RUN command.",
         'k', 
         '$$n[t(a1), mfc(s,expand,rr), np(), k]: description']

notes = []
parameters= []
configurations= []
for i, line in enumerate(lines):
    if re.search(r'\$\$.*\:', line):
        notes.append(re.sub(r'\$\$.*\:', '', line).strip())
        
df = pd.DataFrame({
    'Note': notes,
    'Parameters': parameters,
    'configurations': configurations
})

Expected Output:

+----+------------------------------------------------+--------------+--------------------------+
|    | Note                                           | Parameters   | Configurations           |
|----+------------------------------------------------+--------------+--------------------------|
|  0 | describes the && character in the RUN command. | None         | None                     |
|  1 | description                                    | t,mfc,np,k   | [a1],[s,expand,rr],[],[] |
+----+------------------------------------------------+--------------+--------------------------+

Solution

  • This will create sublists:

    notes = []
    parameters= []
    configurations= []
    for i, line in enumerate(lines):
        expr = re.search(r'\$\$[^:[]*?(?:\[([^:\]]*)\])?\:', line)
        if expr:
            notes.append(re.sub(r'\$\$.*?\:', '', line).strip())
            if expr[1]:
                names = []
                confs = []
                for part in re.findall(r'([^(,]+)(?:\(([^)]*)\))?', expr[1]):
                    names.append(part[0])
                    confs.append(part[1].split(",") if part[1] else [])
                parameters.append(names)
                configurations.append(confs)
            else:
                parameters.append(None)
                configurations.append(None)
    

    If you need those values to be strings instead of sublists, then:

    notes = []
    parameters= []
    configurations= []
    for i, line in enumerate(lines):
        expr = re.search(r'\$\$[^:[]*?(?:\[([^:\]]*)\])?\:', line)
        if expr:
            notes.append(re.sub(r'\$\$.*?\:', '', line).strip())
            if expr[1]:
                names = []
                confs = []
                for part in re.findall(r'([^\s(,]+)(?:\(([^)]*)\))?', expr[1]):
                    names.append(part[0])
                    confs.append(f"[{part[1]}]")
                parameters.append(",".join(names))
                configurations.append(",".join(confs))
            else:
                parameters.append(None)
                configurations.append(None)