Regex that extract string of length that is encoded in string

I have the following string to parse:

X4IitemX6Nabc123

that is structured as follows:

X... marker for 'field identifier'
4... length of item (name), will change according to length of item name
I... identifier for item name, must not be extracted, fixed
item... value that should be extraced as "name"
X... marker for 'field identifier'
6... length of item (name), will change according to length of item name
N... identifier for item number, must not be extracted, fixed
abc123... value that should be extraced as "num" Only these two values will be contained in the string, the sequence is also always the same (name, nmuber).

What I have so far is

\AX(?I<namelen>\d+)U(?<name>.+)X(?<numlen>\d+)N(?<num>.+)$

But that does not take into account that the length of the name is contained in the string itself. Somehow the .+ in the name group should be replaced by .{4}. I tried {$1}, {${namlen}} but that does not yield the result I expect (on rubular.com or regex.191)

Any ideas or further references?

Solution

What you ask for is only possible in languages that allow code insertions in the regex pattern.

Here is a Perl example:

#!/usr/bin/perl
use warnings;
use strict;

my $text = "X4IitemX6Nabc123";
if ($text =~ m/^X(?<namelen>[0-9]+)I(?<name>(??{".{".$^N."}"}))X(?<numlen>[0-9]+)N(?<num>.+)$/) {
    print $text . ": PASS!\n";
} else {
    print $text . ": FAIL!\n"
}
# -> X4IitemX6Nabc123: PASS!

In other languages, use a two-step approach:

Extract the number after X,
Build a regex dynamically using the result of the first step.

See a JavaScript example:

const text = "X4IitemX6Nabc123";
const rx1 = /^X(\d+)/;
const m1 = rx1.exec(text)
if (m1) {
  const rx2 = new RegExp(`^X(?<namelen>\\d+)I(?<name>.{${m1[1]}})X(?<numlen>\\d+)N(?<num>.+)$`)
  if (rx2.test(text)) {
    console.log(text, '-> MATCH!')
  } else console.log(text, '-> FAIL!');
} else {
  console.log(text, '-> FAIL!')
}

See the Python demo:

import re
text = "X4IitemX6Nabc123"
rx1 = r'^X(\d+)'
m1 = re.search(rx1, text)
if m1:
  rx2 = fr'^X(?P<namelen>\d+)I(?P<name>.{{{m1.group(1)}}})X(?P<numlen>\d+)N(?P<num>.+)$'
  if re.search(rx2, text):
    print(text, '-> MATCH!')
  else:
    print(text, '-> FAIL!')
else:
  print(text, '-> FAIL!')

# => X4IitemX6Nabc123 -> MATCH!