Search code examples
powershelltext-parsing

Get-Content splitting but keep leading zeros and split only by first '_' in powershell


I have some txt data like this:

0.0.0.1_03_1          
0.0.0.1_03            
0.0.0.1_02_2_1_3_4          
0.0.0.1_02_1          
0.0.0.1_02            
0.0.0.1_01_1          
0.0.0.1_01  

What I want to achieve is to separate to two variables (0.0.0.1 and the rest) I want to split only by first '_' and to kept leading zeros (01 for example) I am doing like:

Get-Content $SourceTxtDbFile | 
  ConvertFrom-String -Delimiter "_" -PropertyNames DbVersion, ScriptNumber

but the result neither has leading zeros nor are the lines split they way I want them to.


Solution

  • TessellatingHeckler's helpful answer shows you how to use the .Split() method to perform separator-based splitting that limits the number of tokens returned, which in his solution only splits by the 1st _ instance, to return a total of 2 tokens.

    As an aside: you can also use PowerShell's own -split operator, whose use does have its advantages:

    $_ -split '_', 2 # in this case, same as: $_.split('_', 2) 
    

    That said, your later comments suggest that you may be looking to simply remove everything after the 2nd _ instance from your input strings.

    $dbVersion, $scriptNumber, $null  = $_ -split '_', 3 # -> e.g., '0.0.0.1', 03', '1'
    

    Note how specifying $null as the variable to receive the 3rd token effective discards that token, given that we're not interested in it.

    To re-join the resulting 2 tokens with _, it's simplest to use the -join operator:

    $dbVersion, $scriptNumber -join '_'
    

    To put it all together:

    # Sample array of input lines.
    $lines=@'
    0.0.0.1_03_1
    0.0.0.1_03
    0.0.0.1_02_2_1_3_4
    0.0.0.1_02_1
    0.0.0.1_02
    0.0.0.1_01_1
    0.0.0.1_01
    '@ -split '\r?\n'
    
    # Use Get-Content $SourceTxtDbFile instead of $lines in the real world.
    $lines | ForEach-Object {
      # Split by the first two "_" and save the first two tokens.      
      $dbVersion, $scriptNumber, $null = $_ -split '_', 3
      # Re-join the first two tokens with '_'and output the result.
      $dbVersion, $scriptNumber -join '_'
    }
    

    With your sample input, this yields:

    0.0.0.1_03
    0.0.0.1_03
    0.0.0.1_02
    0.0.0.1_02
    0.0.0.1_02
    0.0.0.1_01
    0.0.0.1_01