I'm currently working on an Adobe inDesign script, part of which is a function that finds measurements and picks them apart. I have a set of regexes that are run first using inDesign's findGrep()
(not really relevant here), and then using the basic javascript exec()
(because I need to do things with capture groups).
Now, I know that there are differences between these two regex engines, so I've been working to the capabilities of the much more limited JS engine (I think inDesign's scripting language is based on ECMAscript v3), but I've recently hit a problem that I can't seem to figure out.
Here's the regex I'm currently testing (I've broken up the lines to make it a little easier to read –
((?:one|two|three|four|five|six|seven|eight|nine|ten|\d{4,}|\d{1,3}(?:,\d{3})*)(?:\.\d+)?)
(?=-|‑|\s|°|º|˚|∙|⁰)
(?:[-\s](thousand|million|billion|trillion))?
(?:[-\s](cubic|cu\.?|square|sq\.?))?
This is the sample text I was testing it on.
23 sq metres
45-square-metres
16-cubic metres
96 cu metres
409 cu. metres
12 sq metres
24 sq. metres
Now when I run the regex using inDesign's findGrep()
it works as expected. When I run it using exec()
, however, it does something odd. It will match the numbers and the multipliers just fine, but only "cubic" and "cu" get matched, the "square" and "sq" text is ignored.
To make things more baffling, if I reverse the order of these entries in the regex capture group (so it's (?:[-\s](square|sq\.?|cubic|cu\.?))?
instead), then it only matches "square" and "sq" and not "cubic" and "cu".
Am I missing something really obvious here? I'm a javascript newbie, but I've been working with regular expressions in xslt for years.
str = `23 sq metres
45-square-metres
16-cubic metres
96 cu metres
409 cu. metres
12 sq metres
24 sq. metres
`;
patt = /((?:one|two|three|four|five|six|seven|eight|nine|ten|\d{4,}|\d{1,3}(?:,\d{3})*)(?:\.\d+)?)(?=-|‑|\s|°|º|˚|∙|⁰)(?:[-\s](thousand|million|billion|trillion))?(?:[-\s](cubic|cu\.?|square|sq\.?))?/gm;
while (res = patt.exec(str)) console.log(res);
EDIT:
So, here's the code as I'm trying to run it right now.
str = `23 sq metres
45-square-metres
16-cubic metres
96 cu metres
409 cu. metres
12 sq metres
24 sq. metres
`;
var re = '(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\\.)+)(?:(\\s?(?:-|–)\\s?)(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\\.)+))?(?:[-\\s](thousand|million|billion|trillion))?(?:[-\\s](cubic|cu\\.?|square|sq\\.?))?';
patt = new RegExp(re);
while (res = patt.exec(str)) console.log(res);
If I try to run this on my machine, using the inDesign script, it fails to find anything with "square" or "sq", and when I run it in the code snippet view here it just freezes up. I'm guessing this is something to do with storing regexes as strings, yes?
I'm not sure if I understand you right. If you want that your second code works in about the same way as your first code does, you probably need just to add "gm"
in the RegeExp constructor:
var patt = new RegExp(re, "gm");
str = `23 sq metres
45-square-metres
16-cubic metres
96 cu metres
409 cu. metres
12 sq metres
24 sq. metres
`;
var re = '(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\\.)+)(?:(\\s?(?:-|–)\\s?)(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\\.)+))?(?:[-\\s](thousand|million|billion|trillion))?(?:[-\\s](cubic|cu\\.?|square|sq\\.?))?';
var patt = new RegExp(re, "gm");
while (res = patt.exec(str)) console.log(res[5]);
It gives me this output:
sq
square
cubic
cu
cu.
sq
sq.
I've changed (cubic|cu\\.?|square|sq\\.?)
with (cubic|cu\\.|cu|square|sq\\.|sq)
and it seems work in InDesign now:
str = "23 sq metres\n45-square-metres\n16-cubic metres\n96 cu metres\n409 cu. metres\n12 sq metres\n24 sq. metres";
var re = '(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\\.)+)(?:(\\s?(?:-|–)\\s?)(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\\.)+))?(?:[-\\s](thousand|million|billion|trillion))?(?:[-\\s](cubic|cu\\.|cu|square|sq\\.|sq))?';
var patt = new RegExp(re, "gm");
var msg = "";
while (res = patt.exec(str)) msg += res[0] + " : " + res[5] + "\n";
alert(msg);
Probably these ?
inside (foo|bar)
are too much for InDesign script model.