Search code examples
regexdelphihierarchical-datapcredfm

Regular Expression Problem: Match in Context


I have a structured file with hierarchical text which describes a GUI in Delphi (a DFM-File).

Let's assume I have this file and I have to match all "Color = xxx" Lines, which are in the context of TmyButton (marked), but not those in other context. Within the TMyButton-Context there won't be a deeper hierarchical level.

object frmMain: TfrmMain
  Left = 311
  Top = 201
  Color = clBtnFace
  object MyFirstButton: TMyButton
    Left = 555
    Top = 301
    Color = 16645072           <<<<<<MATCH THIS
    OnClick = ButtonClick
  end
  object MyLabel: TLabel
    Left = 362
    Top = 224
    Caption = 'a Caption'
    Color = 16772831
    Font.Color = clWindowText
  end
  object Panel2: TLTPanel
    Left = 348
    Top = 58
    Width = 444
    Height = 155
    Color = clRed
    object MyOtherButton: TMyButton
      Left = 555
      Top = 301
      Color = 16645072         <<<<<<MATCH THIS
      OnClick = ButtonClick
    end
  end
end

I tried it two days long with many, many different tries. Here some of my incomplete pieces of the pattern:

/^[ ]{2,}object [A-Za-z0-9]+: TmyButton\r\n/mi  <<<Matches the needed context
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/mi            <<<Matches the needed result
/^[ ]{2,}end\r\n/mi                             <<<Matches the end of the context

(I don't know why, but I had to use "\r\n" instead of "$"...). I need to put this together, but ignoring the other lines except other "object xxx: yyy" and "end" Lines....

I would be glad to have some help!


Solution

  • Matching a line in a complex context requires a regex feature called lookaround, if you want or have to do it with a single regex. Specifically, you'd need variable-length lookbehind which PCRE doesn't offer.

    So there are two possibilities: Use a scripting approach like Rorick suggested or use a regex that matches everything from the start of your needed context until the actual match, and extract that using a capturing group. That could be done with

    [ ]{2,}object \w+: TMyButton\r\n.*?^([ ]{4,}Color = \w+[ \t]*\r\n)
    

    (brackets around the space inserted for clarity). Your match would then be in capturing group \1

    Nested structures generally are not well suited for regexes (better for parsers) but if you're sure of the structure of your data as you mentioned, it might work OK.