Search code examples
regexpowershellbatch-fileembedded-resourcepolyglot

Powershell: Read a section of a file into a variable


I'm trying to create a kind of a polyglot script. It's not a true polyglot because it actually requires multiple languages to perform, although it can be "bootstrapped" by either Shell or Batch. I've got this part down no problem.

The part I'm having trouble with is a bit of embedded Powershell code, which needs to be able to load the current file into memory and extract a certain section that is written in yet another language, store it in a variable, and finally pass it into an interpreter. I have an XML-like tagging system that I'm using to mark sections of the file in a way that will hopefully not conflict with any of the other languages. The markers look like this:

lang_a_code
# <{LANGB}>
   ... code in language B ...
   ... code in language B ...
   ... code in language B ...
# <{/LANGB}>
lang_c_code

The #'s are comment markers, but the comment markers can be different things depending on the language of the section.

The problem I have is that I can't seem to find a way to isolate just that section of the file. I can load the entire file into memory, but I can't get the stuff between the tags out. Here is my current code:

@ECHO OFF
SETLOCAL EnableDelayedExpansion

powershell -ExecutionPolicy unrestricted -Command ^

    $re = '(?m)^<{LANGB}^>(.*)^<{/LANGB}^>';^
    $lang_b_code = ([IO.File]::ReadAllText(^'%0^') -replace $re,'$1');^
    echo "${re}";^
    echo "Contents: ${lang_b_code}";

Everything I've tried so far results in the entire file being output in the Contents rather than just the code between the markers. I've tried different methods of escaping the symbols used in the markers, but it always results in the same thing.

NOTE: The use of the ^ is required because the top-level interpreter is Batch, which hangs up on the angle brackets and other random things.


Solution

  • Since there is just one block, you can use the regex

    $re = '(?s)^<{LANGB}^>(.*)^^.*^<{/LANGB}^>';^
    

    but with -match operator, and then access the text using $matches[1] variable that is set as a result of -match.

    So, after the regex declaration, use

    [IO.File]::ReadAllText(^'%0^') -match $re;^
    echo $matches[1];