Search code examples
regexpowershellcase-sensitivefile-renamewildcard-expansion

Case-normalize filenames based on wildcard patterns


How can I case-normalize filenames based on the literal components of matching wildcard patterns?

Consider the following filenames:

ABC_1232.txt
abC_4321.Txt
qwerty_1232.cSv
QwErTY_4321.CsV

They all match the following wildcard patterns:

QWERTY_*.csv
abc_*.TXT

Note how the literal components of the patterns (e.g., QUERTY_, .csv) differ in case from the matching files in the list above (e.g., QwErTY, .CsV).

I want to rename matching files so that the literal parts of the pattern are used case-exactly in the filenames; therefore, the resulting names should be:

abc_1232.TXT
abc_4321.TXT
QWERTY_1232.csv
QWERTY_4321.csv

Tip of the hat to Vladimir Semashkin for inspiring this question.


Solution

  • Terminology note: since pattern can be used to refer to both wildcard expressions and regular expressions, the term glob is used as an unambiguous shorthand for wildcard expressions.


    Simple, but limited solution based on string splitting

    Specifically, the solution below is limited to wildcard patterns that contain one * as the only wildcard metacharacter.

    # Sample input objects that emulate file-info objects
    # as output by Get-ChildItem
    $files =
        @{ Name = 'ABC_1232.txt' },
        @{ Name = 'abC_4321.TxT' },
        @{ Name = 'qwerty_1232.cSv' },
        @{ Name = 'QwErTY_4321.CsV' },
        @{ Name = 'Unrelated.CsV' }
    
    # The wildcard patterns to match against.
    $globs = 'QWERTY_*.csv', 'abc_*.TXT'
    
    # Loop over all files.
    # IRL, use Get-ChildItem in lieu of $files.
    $files | ForEach-Object {    
      # Loop over all wildcard patterns
      foreach ($glob in $globs) {    
        if ($_.Name -like $glob) { # matching filename    
          # Split the glob into the prefix (the part before '*') and
          # the extension (suffix), (the part after '*').
          $prefix, $extension = $glob -split '\*'
    
          # Extract the specific middle part of the filename; the part that 
          # matched '*'
          $middle = $_.Name.Substring($prefix.Length, $_.Name.Length - $prefix.Length - $extension.Length)
    
          # This is where your Rename-Item call would go.
          #   $_ | Rename-Item -WhatIf -NewName ($prefix + $middle + $extension)
          # Note that if the filename already happens to be case-exact, 
          # Rename-Item is a quiet no-op.
          # For this demo, we simply output the new name.
          $prefix + $middle + $extension    
        }
      }
    }
    

    Generalized, but more complex solution with regular expressions

    This solution is significantly more complex, but should work with all wildcard expressions (as long as `-escaping needn't be supported).

    # Sample input objects that emulate file-info objects
    # as output by Get-ChildItem
    $files =
        @{ Name = 'ABC_1232.txt' },
        @{ Name = 'abC_4321.TxT' },
        @{ Name = 'qwerty_1232.cSv' },
        @{ Name = 'QwErTY_4321.CsV' },
        @{ Name = 'Unrelated.CsV' }
    
    # The globs (wildcard patterns) to match against.
    $globs = 'QWERTY_*.csv', 'abc_*.TXT'
    
    # Translate the globs into regexes, with the non-literal parts enclosed in
    # capture groups; note the addition of anchors ^ and $, given that globs
    # match the entire input string.
    # E.g., 'QWERTY_*.csv' -> '^QWERTY_(.*)\.csv$'
    $regexes = foreach($glob in $globs) {
      '^' +
        ([regex]::Escape($glob) -replace '\\\*', '(.*)' -replace  # *
                                         '\\\?', '(.)' -replace   # ?
                                         '\\(\[.+?\])', '($1)') + # [...]
      '$'
    }
    
    # Construct string templates from the globs that can be used with the -f
    # operator to fill in the variable parts from each filename match.
    # Each variable part is replaced with a {<n>} placeholder, starting with 0.
    # E.g., 'QWERTY_*.csv' -> 'QWERTY_{0}.csv'
    $templates = foreach($glob in $globs) {
      $iRef = [ref] 0
      [regex]::Replace(
        ($glob -replace '[{}]', '$&$&'), # escape literal '{' and '}' as '{{' and '}}' first
        '\*|\?|\[.+?\]', # wildcard metachars. / constructs
        { param($match) '{' + ($iRef.Value++) + '}' } # replace with {<n>} placeholders
      )
    }
    
    # Loop over all files.
    # IRL, use Get-ChildItem in lieu of $files.
    $files | ForEach-Object {
    
      # Loop over all wildcard patterns
      $i = -1
      foreach ($regex in $regexes) {
        ++$i
        # See if the filename matches
        if (($matchInfo = [regex]::Match($_.Name, $regex, 'IgnoreCase')).Success) {
          # Instantiate the template string with the capture-group values.
          # E.g., 'QWERTY_{0}.csv' -f '4321' 
          $newName = $templates[$i] -f ($matchInfo.Groups.Value | Select-Object -Skip 1)
    
          # This is where your Rename-Item call would go.
          #   $_ | Rename-Item -WhatIf -NewName $newName
          # Note that if the filename already happens to be case-exact, 
          # Rename-Item is a quiet no-op.
          # For this demo, we simply output the new name.
          $newName    
        }
      }
    }