Search code examples
regexwikitext

How can I apply a regex only inside an infobox?


I need to remove wikicode image tags, but only inside infoboxes, using AutoWikiBrowser (.NET flavour).

For example, here, I need to keep only the "xyz.jpg" image name and its extension, but not touch the "abc.jpg" nor the contents of the infobox:

{{Infobox xyz
|aaa= Xyz
|bbb= [[Xyz]]
|ccc=[[File:xyz.jpg|thumb|Xyz.]]
|ddd= {{lang|en|xyz}}
}}

[[File:abc.jpg|thumb|Abc.]]

I have the regex to remove image tags: \[\[ *fi(?:le|chier) *: *([^\|]*)[^\]]*\]\] (test here), but it also modifies abc.jpg

I also found a regex that selects only the infobox: (?=\{Infobox)(\{([^{}]|(?1))*\}) (test here), but it isn't in .NET flavour and I cannot adapt it to do what I want.

I am not sure that something like the last regex is possible in .NET as this flavour doesn't seem to accept subroutines, but is there is a way to do it anyway, knowing that I must use the .NET flavour?


Solution

  • For my experience, typically infoboxes have a level of "template recursion" of at most 2. So this should be a practical patch:

    (\{\{Infobox(?:[^{}]|\{\{(?:[^{}]|\{\{[^{}]*\}\})*\}\})*)\[\[ *(?:image|fi(?:le|chier)) *: *([^\|\]]*)[^\]]*\]\]
    

    Note that also "Image:" is a valid namespace for images/files.

    See a demo here.

    Also, beware that such kind of regex does not matches multiple files in the same infobox. As a workaround, you can either use lookbehind for avoiding the regex start matching from {{Infobox (like here), or in AWB's "Advanced settings" you can set "Apply No. of times" to an acceptable number of replacements for every infobox (like 5).