Search code examples
lualua-patterns

Lua pattern -- how can I get this to work?


I have a text file to process, with some example content as follows:

[FCT-FCTVALUEXXXX-IA]
Name=value
Label = value
Zero or more lines of text
Abbr=
Zero or more lines of text
Field A=1
Field B=0
Zero or more lines of text
Hidden=N
[Text-FCT-FCTVALUEXXXX-IA-Note]
One or more note lines
[FCT-FCT-FCTVALUEZ-IE-DETAIL]
Zero or more lines of text
[FCT-FCT-FCTVALUEQ-IA-DETAIL]
Zero or more lines of text
[FCT-_FCTVALUEY-IA]
Name=value
Zero or more lines of text
Label=value
Zero or more lines of text
Field A=1
Abbr=value
Field A=1
Zero or more lines of text
Hidden=N

I need to find sections like this:

[FCT-FCTVALUEXXXX-IA]
Name=value
Label = value
Zero or more lines of text
Abbr=
Zero or more lines of text
Field A=1
Field B=0
Zero or more lines of text
Hidden=N

and extract FCT-FCTVALUEXXXX-AA, Name, Label, Abbr, Field A and B and Hidden, and then find a corresponding section (if it exists):

[Text-FCT-FCTVALUEXXXX-IA-Note]
One or more note lines

end extract the Note lines as a single string.

I don't care about the sections

[FCT-FCT-FCTVALUEZ-IE-DETAIL]
Zero or more lines of text

All three sorts of sections can appear anywhere in the file, including right at the end, and there's no predictable relationship in position between the sections.

The order of Abbr and Fields A and B cannot be guaranteed but they always appear after Name and Label and before Hidden.

What I have so far:

strParse = "(%[FCT%-.-%-)([IF])([EA])%]%c+Name=(.-)%c.-Label=(.-)%c(.-)Hidden=(%a)%c" --cant pull everything out at once because the order of some fields is not predictable
for id, rt, ft, name, label, detail, hidden in strFacts:gmatch(strParse) do
    --extract details
    abbr=detail:match("Abbr=(.-)%c") --may be blank
    if abbr == nil then abbr = "" end
    FieldA = detail:match("Field A=(%d)")
    FieldB = detail:match("Field B=(%d)")               
    --need to sanitise id which could have a bunch of extraneous material tacked on the front and use it to get the Note
    ident=id:match(".*(%[FCT%-.-%-$)")..rt..ft
    Note = ParseAutonote(ident)  --this is a function to parse the note which I've yet to test so a dummy function returns ""                      
    tblResults[name]={ident, rt, ft, name, label, abbr, FieldA, FieldB, hidden, note}               
end

Most of it works OK (after many hours of working on it), but the piece that isn't working is:

(".*(%[FCT%-.-%-$)")

which is supposed to pull out the final occurrence of FCT-sometext- in the string id

My logic: anchor the search to the end of the string and capture the shortest possible string beginning with "[FCT-" and ending with "-" at the end of the string.

Given a value of either "[FCT-_ABCD-PDQR-" or "[FCT-XYZ-DETAIL]lines of text[FCT-_ABCD-PDQR-" it returns nil when I want it to return "FCT-_ABCD-PDQR-". (Note ABCD, PDQR etc can be any length of text containing Alpha, - and _).


Solution

  • As you discovered yourself (".*(%[FCT%-.-%-)$") works the way you want, where (".*(%[FCT%-.-%-$)") does not. $ and ^ are anchors and must come at the end or beginning of the pattern, they can not appear inside a capture closure.

    When the anchor characters appear anywhere else in the pattern they will be part of the string you are looking for, excluding cases where ^ is used in a set to exclude chars i.e.: excluding upper-case chars [^A-Z]

    Here are examples of the pattern matching using the an example string and the pattern from your question.

    print(string.match("[FCT-_ABCD-PDQR-", (".*(%[FCT%-.-%-$)")))  -- initial pattern
    > nil
    print(string.match("[FCT-_ABCD-PDQR-$", (".*(%[FCT%-.-%-$)"))) -- $ added to end of string
    > [FCT-_ABCD-PDQR-$
    print(string.match("[FCT-_ABCD-PDQR-", (".*(%[FCT%-.-%-)$")))  -- $ moved to end of pattern
    > [FCT-_ABCD-PDQR-