Search code examples
lualua-patterns

How to Capture html tags using lua pattern


This is how what i'm trying to extract from looks : http://pastebin.com/VD0K3ZcN

lines:match([[title="(value here)">]])

How can I get the "value here"? it does not have numbers or the ">" symbol inside it, only letters, spaces, ' - and .

I have tried

lines:match([[title="(.+)">]])

but it simply got the whole line after the capture.


Solution

  • The problem with your pattern is this:

    title="    -- This is fine, but you probably want to find out what tag title is in.
    (.+)       -- Problem: Greedy match. I'll illustrate this later.
    ">         -- Will match a closing tag with a double quote.
    

    Now, if I have this HTML:

    <html>
     <head title="Foobar">
     </head>
     <body onload="somejs();">
     </body>
    </html>
    

    Your pattern will match:

    Foobar"></head><body onload="somejs();
    

    You can fix this by using (.-). This is the non-greedy version, and it will match the least amount possible, stopping once it finds the next "> instead of the last ">.