Search code examples
stringpowershellreplacewildcardregexp-replace

replace string that includes a wildcard


Would someone be able to advise me on this, please?

The source file (filename sourcetrans.txt) content includes path and row number eg:

C:\temp\TRANSFile1.txt:1001:DTRANS       1.111111111                   12345667889       debit  product1           
C:\temp\TRANSFile1.txt:20002:DTRANS       2.222222222                  23143453456       credit product2                
C:\temp\TRANSFile1.txt:300:DTRANS       3.333333333                    23443655678       debit  product3                                                            

I'm trying to extract the debit rows only and redirect output to another file but I also wish to remove the file path and row number from the start of each row ie remove this string:

C:\temp\TRANSFile1.txt:1001:
C:\temp\TRANSFile1.txt:300:

so the desired output being:

DTRANS       1.111111111                   12345667889       debit  product1                         
DTRANS       3.333333333                   23443655678       debit  product3  

It was going quite well with the below command

(
  Get-Content "C:\temp\sourcetrans.txt" |
  where {$_ -like "C:\temp\TRANSFile1.txt:*debit*"}
).Replace('C:\temp\TRANSFile1.txt:','') |
  Out-File -FilePath C:\temp\SSFinal.txt -append 

and the output file SSfinal.txt:

1001:DTRANS      1.111111111                   12345667889       debit  product1                          
300:DTRANS       3.333333333                    23443655678       debit  product3

However, the output still contains the row number. I thought it would be a simple case of using * wildcard to filter out the row number eg:

(
  Get-Content "C:\temp\sourcetrans.txt" | 
  where {$_ -like "C:\temp\TRANSFile1.txt:*debit*"}
).Replace('C:\temp\TRANSFile1.txt:*:','') |
  Out-File -FilePath C:\temp\SSFinal.txt -append 

However this doesn't work, including the * returns the full string again including the filepath and row number. Any advice greatly appreciated.

Expected:

DTRANS       1.111111111                   12345667889       debit  product1                         
DTRANS       3.333333333                   23443655678       debit  product3  

Solution

  • The .Replace() .NET method only supports literal replacements (it doesn't support wildcard expressions or regexes).

    By contrast, PowerShell's -replace operator is regex-based, so using it instead of .Replace() is one option (I'm using a single input line as an example; see the next section for a complete solution):

    # Remove everything up to and including 
    # the ":" after the number following the path.
    $line -replace '^.:.+?:.+?:'
    

    Another option is to use the -split operator:

    # Split the line into at most 4 ":"-separated tokens and extract the last one.
    ($line -split ':', 4)[-1]
    

    However, you can also use the regex-based -match operator in your where (Where-Object) call to both match only the lines of interest and use a capture group ((…)) to capture only the relevant part of each line of interest, which can then be accessed via the automatic $Matches variable in a subsequent ForEach-Object call:

    Get-Content C:\temp\sourcetrans.txt | 
      Where-Object { $_ -match '^.:.+?:.+?:(.*debit.*)$' }
      ForEach-Object { $Matches[1] } |
      Out-File -FilePath C:\temp\SSFinal.txt -Append
    

    Note: -Append is only needed if you need to append content to a preexisting file.

    For an explanation of the regex and the ability to experiment with it, see this regex101.com page.