I am going to explain the problem I faced with the following sample. I am able to parse the following data with the following config. When I used the {{ignore}}
command, it helps me to get the line as the line matches the correct template, and ignore the data that I don't want to have.
from ttp import ttp
import json
data_to_parse = """
1.peace in the world
2.peace in the world world
3.peace in the world world world
"""
To parse this data I can use the following template.
ttp_template = """
<group name="Quote">
{{peace}} in the {{world}}
</group>
<group name="Quote">
{{peace}} in the {{world}} {{ignore}}
</group>
<group name="Quote">
{{peace}} in the {{world}} {{ignore}} {{ignore}}
</group>
"""
With the following config, I can have the parsed data as I wish:
def parser(data_to_parse):
parser = ttp(data=data_to_parse, template=ttp_template)
parser.parse()
# print result in JSON format
results = parser.result(format='json')[0]
#print(results)
#converting str to json.
result = json.loads(results)
print(result)
parser(data_to_parse)
See the output I have:
The problem is that I can not guess how many "world" at the of the each line, and I don't want to keep writing {{ignore}} commands to get the required line and avoid the word that I don't want to have. For example, if I add the following line in my data, it will not be catched with the template I shared above, I will need to add one more {{ignore}} to capture following data.
4.peace in the world world world world
What I have understood that the reason for this the ttp seperates the words from each space. For example, incase I have _
instead of 'space' as following 3.peace in the world_world_world
I can get the data with a simple line in my template. However, in my data, I have lines with spaces that I need to be aware of and capture these lines as well.
So the question is that is there any way to facilitate this process? As you see that I have a workaround, however I need to find out a simple way to resolve the issue. Highly appreciate for any advise.
I have found a way to resolve this. {{ name | PHRASE }}
or {{ name | ORPHRASE }}
can be used for this purpose.
{{ name | PHRASE }}
This pattern matches any phrase - collection of words separated by single space character, such as “word1 word2 word3”.
{{ name | ORPHRASE }}
In many cases data that needs to be extracted can be either a single word or a phrase, the most prominent example - various descriptions, such as interface descriptions, BGP peers descriptions etc. ORPHRASE allows to match and extract such a data.