I have a simple pyparsing
grammar that matches numbers separated by spaces:
from pyparsing import *
NUMBER = Word( nums )
STATEMENT = ZeroOrMore( NUMBER )
print( STATEMENT.parseString( "1 2 34" ) )
Given 1 2 34
test string it returns 3 strings that are parsed tokens. But how do I find the location of each token in the original string? I need it for "kind of" syntax highlighting.
Add this parse action to NUMBER:
NUMBER.setParseAction(lambda locn,tokens: (locn,tokens[0]))
Parse actions can be passed the tokens that were parsed for a given expression, the location of the parse, and the original string. You can pass functions to setParseAction
with any of these signatures:
fn()
fn(tokens)
fn(locn,tokens)
fn(srctring,locn,tokens)
For your needs, all you need is the location and the parsed tokens.
After adding this parse action, your parsed results now look like:
[(0, '1'), (2, '2'), (4, '34')]
EDIT:
Since my original answer to this post, I've added to pyparsing the locatedExpr
helper, which will give both the starting and ending location for a particular expression. Now this can be written simply as:
NUMBER = locatedExpr(Word(nums))
Here is the full script/output:
>>> from pyparsing import *
... NUMBER = locatedExpr(Word( nums ))
... STATEMENT = ZeroOrMore( NUMBER )
... print( STATEMENT.parseString( "1 2 34" ).dump() )
[[0, '1', 1], [2, '2', 3], [4, '34', 6]]
[0]:
[0, '1', 1]
- locn_end: 1
- locn_start: 0
- value: '1'
[1]:
[2, '2', 3]
- locn_end: 3
- locn_start: 2
- value: '2'
[2]:
[4, '34', 6]
- locn_end: 6
- locn_start: 4
- value: '34'