Search code examples
regexcapturing-group

regex: substitute character in captured group


EDIT

In a regex, can a matching capturing group be replaced with the same match altered substituting a character with another?

ORIGINAL QUESTION

I'm converting a list of products into a CSV text file. Every line in the list has: number name[ description] price in this format:

1 PRODUCT description:120
2 PRODUCT NAME TWO second description, maybe:80
3 THIRD PROD:18


The resulting format must include also a slug (with - instead of ) as second field:

1 PRODUCT:product-1:description:120
2 PRODUCT NAME TWO:product-name-two-2:second description, maybe:80
3 THIRD PROD:third-prod-3::18

The regex i'm using is this:

(\d+) ([A-Z ]+?)[ ]?([a-z ,]*):([\d]+)

and substitution string is:

`\1 \2:\L$2-\1:\3:\4

This way my result is:

1 PRODUCT:product-1:description:120
2 PRODUCT NAME TWO:product name two-2:second description, maybe:80
3 THIRD PROD:third prod-3::18

what i miss is the separator hyphen - i need in the second field, that is group \2 with '-' instead of ''.
Is it possible with a single regex or should i go for a second pass?

(for now i'm using Sublime text editor)

Thanx.


Solution

  • I don't think doing this in a single pass is reasonable and maybe it's not even possible. To replace the spaces with hyphens, you will need either multiple passes or use continous matching, both will lose the context of the capturing groups you need to rearrange your structure. So after your first replace, I would search for (?m)(?:^[^:\n]*:|\G(?!^))[^: \n]*\K and replace with -. I'm not sure if Sublime uses multiline modifier per default, you might drop the (?m) then.

    The answer might be a different one, if you were to use a programming language, that supports callback function for regex replace operations, where you could do the to - replace inside this function.