Search code examples
regexwindowspowershellglob

Get path based on regex Powershell


I queried the registry to get a file path I am looking for. However, I need to go one directory lower to retrieve some file info I need. The pattern I am trying to match against is Officexx or OFFICExx. I can't seem to get the path I need.

Found path from registry: C:\Program Files\Microsoft Office

What I need is: C:\Program Files\Microsoft Office\Officexx

Code:

$base_install_path = "C:\Program Files\Microsoft Office";
$full_install_path = $base_install_path+'\Office[\d+.*]'
Write-Output $full_install_path;  

This returns:

C:\Program Files\Microsoft Office\Office[\d+.*] 

Desired output:

C:\Program Files\Microsoft Office\Office15

Not this could be any two digit # ^^


Solution

  • Building on Santiago Squarzon's helpful comment:

    # Find all child directories matching the given wildcard pattern, if any.
    Get-ChildItem -Directory -Path "$base_install_path\Office[0-9][0-9]*"
    
    • Unlike POSIX-compatible shells such as bash, PowerShell does not support automatic globbing of unquoted strings (pattern matching against file names, known as filename expansion) and instead requires explicit use of the Get-ChildItem or Get-Item cmdlets; e.g., the equivalent of bash command pattern='*.txt'; echo $pattern in PowerShell is $pattern='*.txt'; Get-ChildItem -Path $pattern

      • Note that objects describing the matching files or directories are output by these cmdlets; use their properties as needed, e.g. (Get-ChildItem $pattern).Name or (Get-ChildItem $pattern).FullName (full path). Use Get-ChildItem $pattern | Get-Member -Type Properties to see all available properties.
    • The -Path parameter of these cmdlets expects a PowerShell wildcard expression to perform the desired matching, and the expression in the command at the top matches exactly two digits ([0-9][0-9]), followed by zero or more characters (*), whatever they may be (potentially including additional digits).

      • Note: Only PowerShell's wildcard language - as accepted by the -Path and -Include / -Exclude parameters (and in many other contexts) - supports character ranges (e.g. [0-9] to match any decimal digit) and sets (e.g. [._] to match either . or _). By contrast, Get-ChildItem's -Filter parameter uses the wildcard language of the file-system APIs (as cmd.exe does), which does not support them, and additionally exhibits legacy quirks - see this answer for more information.

      • While PowerShell's wildcard character ranges and sets fundamentally work the same as in regexes (regular expressions, see below), regex-specific escape sequences such as \d are not supported, and you generally cannot quantify them; that is, something like [0-9] only ever matches exactly one digit.


    Given that wildcard patterns support only one, non-specific duplication construct, namely the aforementioned *, matching a specific range of digits - such as 1 or 2 at most or a specific count - such as exactly two - requires post-filtering based on a regex (which is what you tried to use):

    # Find all child directories matching the given regex, if any.
    # Matches 'Office' at the start of the name (^),
    # followed by 1 or 2 ({1,2}) digits (\d), 
    # followed by at least non-digit (\D), if any (?)
    Get-ChildItem -Directory -LiteralPath $base_install_path |
      Where-Object Name -match '^Office\d{1,2}\D?'
    

    As for what you tried:

    • [\d+.*] is a regex, but you probably meant \d+.*, i.e. one or more (+) digits (\d) followed by zero more (*) characters, whatever they may be (.)

    • Inside a character-range/set expression ([...]), +, . and * are used verbatim, i.e. they are not metacharacters and match literal . and * characters.