Search code examples
pythonregexparsingxpathpython-re

use python re lib parsing "user define xpath" scripts


I'm building a "xpath parsing tool" in Python for my team. In my case, the xpath script is not the normal xpath, the syntax that user input will be in a special struct, here is an example:

The input format will be like: (the element can be tuple-type or normal element)

sig = "(xpath_1_1, xpath_1_2), (xpath_2_1, xpath_2_2), xpath_3..."

which is edited in excel by users

And my goal is to parse the string into a list-type data with tuple or normal element:

[(xpath_1_1, xpath_1_2), (xpath_2_1, xpath_2_2), xpath_3...]

Then I can input this data into my selenium to snapshot img sequentially.

Here is one of my testing data:

sig = "(//div[@style='font-family:Arial;float: left;width:930px;font-size:12px;' and ./span[contains(text(),'005930')]], //table[@id='gv_flow_krKS0 1']),//table[@id='123456'],(//div[@style='font-family:Arial;float: left;width:930px;font-size:12px;' and ./span[contains(text(),'000660')]],  //table[@id='gv_flow_krKS0 2']),//table[@id='456789']"

I'm wondering is there any better way to implement this func without disrupting the order of list ?

First , I think eval() func is not a good idea since it may cause some security prob.

Now I'm trying to use re lib to solve it.

However I found it's quite difficult and have no idea how to start.

Anyone can help ? Thanks~


Solution

  • OK, I think this does what you want. You should try some different test strings.

    sig = "(//div[@style='font-family:Arial;float: left;width:930px;font-size:12px;' and ./span[contains(text(),'005930')]], //table[@id='gv_flow_krKS0 1']),//table[@id='123456'],(//div[@style='font-family:Arial;float: left;width:930px;font-size:12px;' and ./span[contains(text(),'000660')]],  //table[@id='gv_flow_krKS0 2']),//table[@id='456789']"
    
    
    gather = ''
    element = []
    elements = []
    state = ''
    for c in sig:
        if state:
            gather += c
            if c == state:
                state = ''
            continue
    
        if c == '(':
            in_tuple = True
            continue
        elif c == ')':
            in_tuple = False
            element.append( gather )
            gather = ''
            elements.append(tuple(element))
            element = []
            continue
        elif c == ',':
            if in_tuple:
                element.append( gather )
            else:
                elements.append( gather )
            gather = ''
            continue
        elif c == '[':
            state = ']'
        elif c == "'":
            state = "'"
    
        gather += c
    
    # Handle leftover.
    if element:
        elements.append( element )
    
    for e in elements:
        print( e )
    
    

    Output:

    ("//div[@style='font-family:Arial;float: left;width:930px;font-size:12px;' and ./span[contains(text(),'005930')]]", " //table[@id='gv_flow_krKS0 1']")
    //table[@id='123456']
    ("//div[@style='font-family:Arial;float: left;width:930px;font-size:12px;' and ./span[contains(text(),'000660')]]", "  //table[@id='gv_flow_krKS0 2']")