Search code examples
regexlatexregular-language

Regular expression to embed string with backslash and curly braces in more curly braces


This is a cross-post from TeX, but it did not get any answers there. And since I assume the problem has more to do with my understanding of regular expressions (or better, lack thereof) than with LaTeX itself, StackOverflow may have been the better place to ask to begin with.

I would like to use BibTool (which was written in C, if this is of any consequence here) to enclose some strings in a bib-file in curly braces. The test bib entry looks like this:

@Article{Cite1,
author       = {Adelbert, A.},
date         = {2020},
journaltitle = {A Journal},
title        = {A title with just \textit{Test} structure and some chemistry \ce{CO2}},
number       = {2},
pages        = {1--4},
volume       = {1},
}

I have created the following BibTool resource file:

resource {biblatex}
preserve.keys = on
preserve.key.case = on
rewrite.rule = {"\\\(.*{.*}\)" "{{\1}}"}

The rewrite.rule is supposed to be the following:

  1. Find all strings within any field that start with \, like \ce{}, \textit{}, etc. This is done by the \\ at the beginning of the regular expression.
  2. When this string is found save the following in a group, denoted by \(\): A random string at the beginning, followed by {, a random string, followed by }; i.e. the string textit{Test}.
  3. Write this string back into the same position, but enclose it in a double-set of curly braces "{{\1}}".

What it manages so far:

  1. It apparently finds all commands starting with \.
  2. It saves the strings and writes them back into the file.

So far, the code returns the following

@Article{Cite1,
Author       = {Adelbert, A.},
Date         = {2020},
JournalTitle = {A Journal},
Title        = {A title with just {{textit{Test} structure and some chemistry {{ce{CO2}}}}}},
Number       = {2},
Pages        = {1--4},
Volume       = {1},
}

You see it finds the strings and puts {{ at the beginning of each string. Unfortunately, it puts }} at the end of the field, not the string, so I now have 6 curly braces at the end of the title field. The braces do match, just two of them should be after {{textit{Test} not at the very end. I tried various constructions like rewrite.rule = {"\\\(.*{.*}\)$" "{{\1}}"}, rewrite.rule = {"\\\(.*{.*}\) ?$" "{{\1}}"}, rewrite.rule = {"\\\(.*{.*}\)*$" "{{\1}}"} but this all did not work.

When trying to get the \ back at the beginning of the string, using rewrite.rule = {"\\\(.*{.*}\)" "{{\\\1}}"} I get the \ back, but also thousands of {} until I get a Rewrite limit exceeded error.

I am not very good with regular expressions and would be happy for any comments.


Solution

  • My approach would use two phases. In the first phase I would process the macro with one argument and replace in the result the \ by a replacement representation (here ##). In the second pahe I simply replace ## by \.

    In BibTool this looks as follows:

    rewrite.rule {"\\\(\([a-zA-Z]+\|.\){[^{}]*}\)" "{##\1}"}
    rewrite.rule {"##" "\\"} 
    

    Note, that in general the task depicted can not be solved with regular expressions...