Would someone be able to advise me on this, please?
The source file (filename sourcetrans.txt
) content includes path and row number eg:
C:\temp\TRANSFile1.txt:1001:DTRANS 1.111111111 12345667889 debit product1
C:\temp\TRANSFile1.txt:20002:DTRANS 2.222222222 23143453456 credit product2
C:\temp\TRANSFile1.txt:300:DTRANS 3.333333333 23443655678 debit product3
I'm trying to extract the debit
rows only and redirect output to another file but I also wish to remove the file path and row number from the start of each row ie remove this string:
C:\temp\TRANSFile1.txt:1001:
C:\temp\TRANSFile1.txt:300:
so the desired output being:
DTRANS 1.111111111 12345667889 debit product1
DTRANS 3.333333333 23443655678 debit product3
It was going quite well with the below command
(
Get-Content "C:\temp\sourcetrans.txt" |
where {$_ -like "C:\temp\TRANSFile1.txt:*debit*"}
).Replace('C:\temp\TRANSFile1.txt:','') |
Out-File -FilePath C:\temp\SSFinal.txt -append
and the output file SSfinal.txt
:
1001:DTRANS 1.111111111 12345667889 debit product1
300:DTRANS 3.333333333 23443655678 debit product3
However, the output still contains the row number.
I thought it would be a simple case of using *
wildcard to filter out the row number eg:
(
Get-Content "C:\temp\sourcetrans.txt" |
where {$_ -like "C:\temp\TRANSFile1.txt:*debit*"}
).Replace('C:\temp\TRANSFile1.txt:*:','') |
Out-File -FilePath C:\temp\SSFinal.txt -append
However this doesn't work, including the *
returns the full string again including the filepath and row number. Any advice greatly appreciated.
Expected:
DTRANS 1.111111111 12345667889 debit product1
DTRANS 3.333333333 23443655678 debit product3
The .Replace()
.NET method only supports literal replacements (it doesn't support wildcard expressions or regexes).
By contrast, PowerShell's -replace
operator is regex-based, so using it instead of .Replace()
is one option (I'm using a single input line as an example; see the next section for a complete solution):
# Remove everything up to and including
# the ":" after the number following the path.
$line -replace '^.:.+?:.+?:'
Another option is to use the -split
operator:
# Split the line into at most 4 ":"-separated tokens and extract the last one.
($line -split ':', 4)[-1]
However, you can also use the regex-based -match
operator in your where
(Where-Object
) call to both match only the lines of interest and use a capture group ((…)
) to capture only the relevant part of each line of interest, which can then be accessed via the automatic $Matches
variable in a subsequent ForEach-Object
call:
Get-Content C:\temp\sourcetrans.txt |
Where-Object { $_ -match '^.:.+?:.+?:(.*debit.*)$' }
ForEach-Object { $Matches[1] } |
Out-File -FilePath C:\temp\SSFinal.txt -Append
Note: -Append
is only needed if you need to append content to a preexisting file.
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.