Got this text:
Want this || Not this
The line may also look like this:
Want this | Not this
with a single pipe.
I'm using this grammar to parse it:
grammar HC {
token TOP { <pre> <divider> <post> }
token pre { \N*? <?before <divider>> }
token divider { <[|]> ** 1..2 }
token post { \N* }
}
Is there a better way to do this? I'd love to be able to do something more like this:
grammar HC {
token TOP { <pre> <divider> <post> }
token pre { \N*? }
token divider { <[|]> ** 1..2 }
token post { \N* }
}
But this does not work. And if I do this:
grammar HC {
token TOP { <pre>* <divider> <post> }
token pre { \N }
token divider { <[|]> ** 1..2 } }
token post { \N* }
}
Each character before divider gets its own <pre>
capture. Thanks.
As always, TIMTOWTDI.
I'd love to be able to do something more like this
You can. Just switch the first two rule declarations from token
to regex
:
grammar HC {
regex TOP { <pre> <divider> <post> }
regex pre { \N*? }
token divider { <[|]> ** 1..2 }
token post { \N* }
}
This works because regex
disables :ratchet
(unlike token
and rule
which enable it).
(Explaining why you need to switch it off for both rules is beyond my paygrade, certainly for tonight, and quite possibly till someone else explains why to me so I can pretend I knew all along.)
if I do this ... each character gets its own
<pre>
capture
By default, "calling a named regex installs a named capture with the same name" [... couple sentences later:] "If no capture is desired, a leading dot or ampersand will suppress it". So change <pre>
to <.pre>
.
Next, you can manually add a named capture by wrapping a pattern in $<name>=[pattern]
. So to capture the whole string matched by consecutive calls of the pre
rule, wrap the non-capturing pattern (<.pre>*?
) in $<pre>=[...]
):
grammar HC {
token TOP { $<pre>=[<.pre>*?] <divider> <post> }
token pre { \N }
token divider { <[|]> ** 1..2 }
token post { \N* }
}