Search code examples
regexdelphipattern-matchingfilenamespascalscript

RegEx pattern to limit dashes in these circumstances


Scenario

I'm using a 3rd party file renaming software which is written in Delphi and has pascal-script support: http://www.den4b.com/?x=products&product=renamer

The application allows the usage of regular expressions to rename files. this means that if what I need to do with a filename cannot be accomplished only using one RegEx, then I could use simultaneous various expressions or also a pascal-script code to accommodate the filename until I can properly format the filename for the needs of this question or anything else...

Problem

I need to format song filenames like these below, in these filenames the "...featuring artist" part is at the right of the string, I need to match that and position it in the left part of the string.

  • Carbin & Sirmark - Sorry Feat. Sevener
  • Kristjan Cash Cash - Take Me Home Feat. Bebe Rexha (Revoke Remix)

To make this simple to understand, we could imaginary tokenize the filename like this:

[0]ARTIST   [1]DASH   [2]TRACK   [3]FEAT_ARTIST   [4]POSSIBLE_ADDITIONAL_INFO_INSIDE:()[]{}

Then what I need to do with a RegEx, is format the filename to positionate the tokens in this order:

[0]ARTIST   [3]FEAT_ARTIST   [1]DASH   [2]TRACK   [4]POSSIBLE_ADDITIONAL_INFO_INSIDE:()[]{}

I actually do that using this RegEx:

\A([^-]?)\s-\s*(.?)\s([([])?((ft[.\s]|feat[.\s]|featuring[.\s])[^(){}[]]*)([)]])?(.+)?\Z

Replacing with:

$1 $4 - $2$7

The problem begins here, because the [0]ARTIST and [2]TRACK tokens could contains dashes like for example this filename:

  • Dj E-nergy C-21 - My Super-hero track! feat Dj Ass-hole

Then, correct me if I'm wrong, but I think its just impossible to solve this in any way, because a machine can't predict when to separate one token for the other, what is a name or what isn't, because I can't know the number of dashes that contains the filename.

For that reason, instead of looking for ingenuos perfection that could cause bad filenames because the amount of dashes inside, I prefer to look for a filename exclusion solution, by limiting the dashes that the expression should match in the filename.

Question

Taking as example the RegEx that I shown above to extend/improve it, how I could exclude filenames that contains an [0]ARTIST or an [2]TRACK tokens with dashes?

...Or in other words, how I can tell my RegEx to avoid modifying a filename when the filename contains more than 1 dash before the "...featuring artist" part? (not after)

Basically the Regex should determine whether [1]DASH is found more than once before [3]FEAT_ARTIST, if yes then exclude that filename (don't modify it)

I know how to limit the occurrence of a Regex group something more or less like this ([\-]){1} to match only 1 dash occurrence, but I'm not sure how to implement it in the expression I'm using.


Expected Results

Just some random examples...

One dash only before the [3]FEAT_ARTIST so we can know when to separate [0]ARTIST from [2]TRACK tokens.

  • From: 'Carbin & Sirmark - Sorry Feat. Sevener'
  • To: 'Carbin & Sirmark Feat. Sevener - Sorry'

One dash only before the [3]FEAT_ARTIST so we can know when to separate [0]ARTIST from [2]TRACK tokens. With [4]POSSIBLE_ADDITIONAL_INFO_INSIDE:()[]{}.

  • From: 'Flight Facilities - Heart Attack Feat. Owl Eyes (Snakehips Remix)'
  • To: 'Flight Facilities Feat. Owl Eyes - Heart Attack (Snakehips Remix)'

One dash only before the [3]FEAT_ARTIST so we can know when to separate [0]ARTIST from [2]TRACK tokens. With [4]POSSIBLE_ADDITIONAL_INFO_INSIDE:()[]{} which also contains dashes.

  • From: 'Flight Facilities - Heart Attack Feat. Owl Eyes [Snake--hips Remix]'
  • To: 'Flight Facilities Feat. Owl Eyes - Heart Attack [Snake--hips Remix]'

One dash only between [0]ARTIST an [2]TRACK tokens, but the filename doesn't have a [3]FEAT_ARTIST so we don't touch it.

  • From: 'Fedde Le Grand - Cinematic'
  • To: 'Fedde Le Grand - Cinematic'

One dash only between [0]ARTIST an [2]TRACK tokens, but the [3]FEAT_ARTIST is before the [1]DASH so we don't touch it.

  • From: 'Fedde Le Grand Feat. Denny White - Cinematic'
  • To: 'Fedde Le Grand Feat. Denny White - Cinematic'

[0]ARTIST has dashes, so we can't know when to separate [0]ARTIST and [2]TRACK tokens, so the Regex should excludes this to don't modify this filename.

  • From: 'Artist-Name - Track Name feat someone'
  • To: 'Artist-Name - Track Name feat someone'

[2]TRACK has dashes, so we can't know when to separate [0]ARTIST and [2]TRACK tokens, so the Regex should excludes this to don't modify this filename.

  • From: 'Artist Name - Track-Name feat someone'
  • To: 'Artist Name - Track-Name feat someone'

[0]ARTIST and [2]TRACK tokens has dashes, so we can't know when to separate them, so the Regex should excludes this to don't modify this filename.

  • From: 'Dj E-nergy C-21 - My Super-hero track! feat Dj Ass-hole'
  • To: 'Dj E-nergy C-21 - My Super-hero track! feat Dj Ass-hole'

[0]ARTIST and [2]TRACK tokens has dashes and also [3]FEAT_ARTIST doesn't exists, again nothing to do here.

  • From: 'Dj E-nergy C-21 - My Super-hero track!'
  • To: 'Dj E-nergy C-21 - My Super-hero track!'

I hope this helps to understand what I need.


Solution

  • Try with:

    ^(.+)\s+-\s+(.+?)\s+[fF](t|eat(uring)?)?\.?([^([\])\n]+)(.+)?$
    

    DEMO

    and use replace with: $1 Feat.$5 - $2$6

    I tried it with ReNamer and Regex101, and it works also if there is - ( + - + ) in artist name, like artist - name, BUT it will fail if there is such fragment in title part.

    The ^(.+)\s+-\s+ part use a greedy quantifier .+ before a sequence space-dash-space, which is treated as delimiter between artist name and title of track. So it will match as much as it can, up to last occurrence of -, because of that, it will "ignore" the dashes with spaces in names of artist, but it will case invalid match, if such element occur in track title. So the:

    • Artist - name - track title feat. someone - it will be matched and modified properly,
    • Artist name - track - title feat. someone - it will fail, as text will be splitted on last dash.

    Instead of (ft[.\s]|feat[.\s]|featuring[.\s]) I used [fF](t|eat(uring)?)?\.? which match similar, but should work faster (it should restrain backtracing a little bit).

    in my demo, there is a + instead \s+ (like above) as it would match multiline in the demonstration, and show invalid results, but in oneline cases, like in your problem, it should work fine.