I have a tab delimited data set as such (some columns removed but not relevant here):
XYZ 12345671 18884 18912 222.00 222.00 Standard 2 CMD.1
ABC 12345672 18875 397.00 Standard 1 CMD.2
ABC 12345673 18889 18919 18950 146.66 146.66 146.68 Standard 3 CMD.2
XYZ 12345674 18840 18871 18901 193.00 193.00 192.00 Standard 3 CMD.1
Unfortunately in the data set there are two multi-value fields (with 1 or more sub-values) using the same delimiter as every other column. All of the 5 digit numbers starting with 18 are one field, then all of the decimal numbers are a second field. The number of sub-values will always match between these fields (that is, if the first has 2 values the second will have 2 values). The lonae number by itself between "Standard" and "CMD.X" is the number of sub-values in each of those fields. Basically, what I'm trying to create is:
XYZ 12345671 18884;18912 222.00;222.00 Standard 2 CMD.1
ABC 12345672 18875 397.00 Standard 1 CMD.2
ABC 12345673 18889;18919;18950 146.66;146.66;146.68 Standard 3 CMD.2
XYZ 12345674 18840;18871;18901 193.00;193.00;192.00 Standard 3 CMD.1
Due to the filesize, I am trying to use this with System.IO.File (as many as 100,000k records):
$File = [System.IO.File]::ReadAllText('in_file.txt')
ForEach($RecordSet In $File){
$RecordSet = $RecordSet -CReplace '(?m)(/d/d/d/d/d)`t(/d/d/d/d/d)','(/d/d/d/d/d);(/d/d/d/d/d)'
$RecordSet = $RecordSet -CReplace '(?m)([0-9]*\.[0-9]+|[0-9]+)`t([0-9]*\.[0-9]+|[0-9]+)','([0-9]*\.[0-9]+|[0-9]+);([0-9]*\.[0-9]+|[0-9]+)'
Add-Content ('out_file.txt") $RecordSet
}
Yes, I realize the in_file.txt is a stupid format -- it's not changeable unfortunately.
Maybe,
(?<=\s\d{5}\b)\s+(?=\b\d{5}\b)|(?<=\s\d{3}\.\d{2}\b)\s+(?=\b\d{3}\.\d{2}\b)
and a replacement of ;
might be OK to look into.
or you can likely capture those with an expression similar to,
(\d+)\s{3}(\d+)\s{3}(?:(\d+)\s{3})?(\d+\.\d+)\s{2}(\d+\.\d+)\s{2}(?:(\d+\.\d+)\s{2})?Standard
then, replace with $1;$2;$3;$4;$5;$6;
or $1;$2;$4;$5;
.
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.