Search code examples
regexsublimetext

How should I use sublime text regex engine (PCRE) to delete all latex comments?


I followed the question in another post: Regex to capture LaTeX comments

The provided answer is awesome. However, it seems like it can only be used in the .net engine, but I want to use the PCRE engine. That is because I'm using sublime text and it seems like this engine is used by default. I tried many times, but without success.

The latex is

\usepackage{test}%COMMENT1

TEXT
%COMMENT2
TEXT

Value is 10\%, this should not be removed. %COMMENT3

begin{tikz}[
important 1,
%COMMENT4
important 2, %COMMENT5
]

TEXT
%COMMENT 6

TEXT

Table: value1&value2\\%COMMENT7
Table: value1&value2\\      %COMMENT8
Table: value1&value2            \\%COMMENT 9
Table: value1&value2\\%            COMMENT 10
Table: value1&value2\\%COMMENT11       

I tried (?m)(?<=(?<!\\)(?:\\{0}))%.*(?:\r?\n(?!\r?$))?. Only works on comment 1-6,8.

The online results can be found https://regex101.com/r/zSIBMu/3

How should I modify it?


Solution

  • You might also make use of a SKIP FAIL approach:

    \\(?<!\\.)(?>\\\\)*%(*SKIP)(*F)|%.*
    

    The pattern matches:

    • \\(?<!\\.) Match \ not preceded with \
    • (?>\\\\)* Match optional pairs of \\
    • %(*SKIP)(*FAIL) Match % and skip the match
    • | Or
    • %.* Match % and the rest of the line

    Regex demo

    Edit

    A non lookaround solution suggested by Casimir et Hippolyte to skip the match for \\ or \%

    \\[\\%](*SKIP)(*F)|%.*
    
    • \\[\\%] Match either \\ or \% using a character class
    • (*SKIP)(*FAIL) Skip the match
    • | Or
    • %.* Match % and the rest of the line

    Regex demo