I have a single text file that contains 60K+ lines in it. Those 60K+ lines are actually around 50 or so programs written in Natural. I need to break them apart into individual programs. I have a script that works perfectly with a single flaw. The naming of the output files.
Every program starts with "Module Name=", followed by the actual name of the program. I need to split the programs and save them using the actual program names.
Using the example below, I would like to create two files called Program1.txt and Program2.txt each containing the lines belonging to them. I have a script, also below, that separates the files correctly, but I am unable to discern the correct way to capture the Program name and use that as the name of the output file.
Example:
Module Name=Program1
....
....
....
END
Module Name=Program2
....
....
....
END
Code:
$InputFile = "C:\Natural.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
$a = 1
While (($Line = $Reader.ReadLine()) -ne $null) {
If ($Line -match "Module Name=") {
$OutputFile = "MySplittedFileNumber$a.txt"
$a++
}
Add-Content $OutputFile $Line
}
Combine a switch
statement, which can read files line by line efficiently with -File
and can match each line against regex(es) with -Regex
, and use a System.IO.StreamWriter
instance to write the output files efficiently:
$outStream = $null
switch -Regex -File C:\Natural.txt {
'\bModule Name=(\w+)' { # a module start line
if ($outStream) { $outStream.Close() }
$programName = $Matches[1] # Extract the program name.
# Create a new output file.
# Important: use a *full* path.
$outStream = [System.IO.StreamWriter] "C:\$programName.txt"
# Write the line at hand.
$outStream.WriteLine($_)
}
default { # all other lines
# Write the line at hand to the current output file.
$outStream.WriteLine($_)
}
}
if ($outStream) { $outStream.Close() }
Note:
The code assumes that the very first line in the input file is a Module Name=...
line.
The regex matching is case-insensitive by default, as PowerShell generally is; add -CaseSensitive
, if needed.
The automatic $Matches
variable is used to extract the program name from the matching result.