Search code examples
regexpowershellfile-rename

Remove all following characters after the second occurence of a string in a filename with Powershell


In my music library I have filenames like this:

  1. Artist - Song (feat. OtherArtist) (feat. OtherArtist).mp4
  2. Artist - Song (feat. OtherArtist) (Radio Edit) (feat. OtherArtist).mp4

What I want is to remove the duplicate feature mention at the end. This is what I came up with so far:

Get-ChildItem -Path "path" -Recurse -Filter *feat*feat* | ForEach-Object { $_ | Rename-Item -NewName $_.Name.SubString(0,$_.Name.Length -10) }

This gets all the files with dublicate features and then just removes the last 10 characters (including the file extension unfortunately), which cleary wont work if the song features multiple artist or even artist with longer names.

I think I need regular expressions for this, but I'm still at most a beginner in using Powershell, so I would be really thankful for some help.


Solution

  • RegEx can indeed do what you want. You just need to do something very similar to what you have in your first filter. Here's the magic string:

    (.*feat.*?)(\s*\(?feat.*\)\s*)(\..+)
    

    You can use it like this (skipping the ForEach loop entirely):

    Get-ChildItem -Path .\* -Recurse -Filter *feat*feat* | Rename-Item -NewName {$_.Name -replace '(.*feat.*?)(\s*\(?feat.*\)\s*)(\.[^\.]+)','$1$3'}
    

    And here's how that string breaks down, and what all it does:

    (.*feat.*)(\s\(feat.*\)\s*)(\.[^\.]+)

    1st Capturing Group (.feat.)
    . matches any character (except for line terminators)
    * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    feat matches the characters feat literally (case sensitive)
    . matches any character (except for line terminators)
    *? matches the previous token between zero and unlimited times, as few times as possible, expanding as needed (lazy)

    2nd Capturing Group (\s(feat.)\s)
    \s matches any whitespace character (equivalent to [\r\n\t\f\v ])
    * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    ( matches the character ( literally (case sensitive)
    ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
    feat matches the characters feat literally (case sensitive)
    . matches any character (except for line terminators)
    * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
    ) matches the character ) literally (case sensitive)
    \s matches any whitespace character (equivalent to [\r\n\t\f\v ])
    * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)

    3rd Capturing Group (\.[^\.]+)
    \. matches the character . literally (case sensitive)
    Match a single character not present in the list below [^\.]
    + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)

    See it work here (where I got the string break down from, but much better formatted): https://regex101.com/r/FWlNLi/1