Search code examples
regexazurepowershellextractazure-powershell

Extract information via Powershell regex


I am new in Powershell. I have a file that consists of the following lines. From this file, I want to extract v1.0.2 only via Powershell.

2022-09-08T10:52:38.0802281Z Downloading git::ssh://git@ssh.dev.azure.com/v3/basf-terraform/Terraform_modules/azure_private_endpoint?ref=v1.0.2 for resuc_dls1_pep1...


Solution

  • There's a module I build that contains a lot of regular expressions, including one that could help here.

    The module is called Irregular. You can get started by installing the module (Install-Module Irregular -Scope CurrentUser).

    In it, there are two regular expressions that can help you out:

    • ?<Code_BuildVersion>
    • ?<Code_SemanticVersion>

    To see the definitions for either, after you've installed and imported, run:

    ?<Code_BuildVersion>
    ?<Code_SemanticVersion>
    

    In your string ?<Code_BuildVersion> that would match '38.0802281' (part of the timestamp) and 1.0.2. ?<Code_SemanticVersion> will match a version with at least 3 parts, and thus will only find the 1.0.2.

    To make this work, there are a few options:

    1. Use ?<Code_SemanticVersion> to match
    $LogLine = '2022-09-08T10:52:38.0802281Z Downloading git::ssh://git@ssh.dev.azure.com/v3/basf-terraform/Terraform_modules/azure_private_endpoint?ref=v1.0.2'
    $logLine | ?<Code_SemanticVersion> | Select-Object -ExpandProperty Value
    
    1. Create a new regex based off of ?<Code_BuildVersion>, using 'v' as it's start:
    $findVersion = [Regex]::New('
    v
    (?<Code_BuildVersion>
    (?<Major>\d+)
    \.
    (?<Minor>\d+)
    (?:\.(?<Build>\d+))?
    (?:\.(?<Revision>\d+))?
    )
    ', 'IgnorePatternWhitespace')
    
    $findVersion.Matches('2022-09-08T10:52:38.0802281Z Downloading git::ssh://git@ssh.dev.azure.com/v3/basf-terraform/Terraform_modules/azure_private_endpoint?ref=v1.0.2') 
    
    1. Build your own quick regex to do this.

    Basically, you can "distill" the regex above into a shorter form, including the v. Note that after the -split, we need to force the

    $FindVersion = 'v\d+\.\d+(?:\.\d+)?(?:\.\d+)?'
    $matched = 
    '2022-09-08T10:52:38.0802281Z Downloading = git::ssh://git@ssh.dev.azure.com/v3/basf-terraform/Terraform_modules/azure_private_endpoint?ref=v1.0.2' -match $findVersion
    $matches.0
    

    I will also adjust the ?<Code_BuildVersion> regular expression so that it does not match when it is preceded by punctuation.

    After this issue is fixed, you should be able to use either regex to extract out what you'd like.