I have the following case I'm trying to script in Powershell. I have done this exercise using Sed on a bash terminal, but having trouble writing in Powershell. Any help would be greatly appreciated.
(sed -r -e '/^N/h;/^[N-]/d;G;s/(.*)\n(.*)/\2 \1/' <file>
, with a file format without <
and >
chars. surrounding the first letter on each line)
The start pattern always start with a <N>
(only 1 instance per block), lines between start with a <J>
, and the end pattern is always --
--------------
<N>ABC123
<J>SomethingHere1
<J>SomethingHere2
<J>SomethingHere3
-------------- <-- end of section
I'm trying to take the first line in each section <N>
and copy it AFTER each <J>
in the same section. For example:
<J>SomethingHere1 <N>ABC123
<J>SomethingHere2 <N>ABC123
<J>SomethingHere3 <N>ABC123
The number of <J>
lines per section can vary (0-N). In a case with no <J>
, nothing needs to be done.
Powershell version:5.1.16299.611
The following, pipeline-based solution isn't fast, but conceptually straightforward:
Get-Content file.txt | ForEach-Object {
if ($_ -match '^-+$') { $newSect = $true }
elseif ($newSect) { $firstSectionLine = $_; $newSect = $False }
else { "{0}`t{1}" -f $_, $firstSectionLine }
}
It reads and processes lines one by one (with the line at hand reflected in automatic variable $_
.
It uses a regex (^-+
) with the -match
operator to identify section dividers; if found, flag $newSect
is set to signal that the next line is the section's first data line.
If the first data line is hit, it is cached in variable $firstSectionLine
, and the $newSect
flag is reset.
All other lines are by definition the lines to which the first data line is to be appended, which is done via the -f
string-formatting operator, using a tab char. (`t
) as the separator.
Here's a faster PSv4+ solution that is more complex, however, and it reads the entire input file into memory up front:
((Get-Content -Raw file.txt) -split '(?m)^-+(?:\r?\n)?' -ne '').ForEach({
$firstLine, $otherLines = $_ -split '\r?\n' -ne ''
foreach ($otherLine in $otherLines) { "{0}`t{1}" -f $otherLine, $firstLine }
})
Get-Content -Raw
reads in the input file in full, as a single string.
It uses the -split
operator to split the input file into sections, and then processes each section.
Regex '(?m)^-+(?:\r?\n)?'
matches a section divider line, optionally followed by a newline.
(?m)
is the multiline option, which makes ^
and $
match the start and end of each line, respectively:\r?\n
matches a newline, either in CRLF (\r\n
) or LF-only (\n
) form.(?:...)
is a non-capturing group; making it non-capturing prevents what it matches from being included in the elements returned by -split
.-ne ''
filters out resulting empty elements.-split '\r?\n'
splits each section into individual lines.
If performance is still a concern, you could speed up reading the file with [IO.File]::ReadAllText("$PWD/file.txt")
.