Consider the file tbl.txt
(1.5 million lines), built like:
Num1 ; Num2 ; 'Value' ; 'Attribute'
So tbl.txt
looks like:
63 ; 193 ; 'Green' ; 'Color' 152 ; 162 ; 'Tall' ; 'Size' 230 ; 164 ; '130lbs' ; 'Weight' 249 ; 175 ; 'Green' ; 'Color' *duplicate on 'Value' and 'Attribute'* 420 ; 178 ; '8' ; 'Shoesize' 438 ; 172 ; 'Tall' ; 'Size' *duplicate on 'Value' and 'Attribute'*
How can i keept the first unique line on 'Value'
and 'Attribute'
and delete following duplicate lines on 'Value'
and 'Attribute'
?
The result should look like:
63 ; 193 ; 'Green' ; 'Color' 152 ; 162 ; 'Tall' ; 'Size' 230 ; 164 ; '130lbs' ; 'Weight' 420 ; 178 ; '8' ; 'Shoesize'
Any help is much appreciated.
Loop over the text-file via Get-Content
, separate the columns 'Value' ; 'Attribute'
through string operations, and then use a hashmap in order to check whether you already processed a similar line -- if not, output the line once. In code:
$map = @{};
Get-Content tbl.txt | `
%{ $key = $_.Substring($_.IndexOf(';',$_.IndexOf(';')+1)+1); `
If(-not $map.ContainsKey($key)) { $_; $map[$key] = 1 } `
}
Alternatively, as mentioned in the comments, you can use group
and apply the same substring as grouping criterium, and finally take the first element of each group:
Get-Content tbl.txt | group {$_.Substring($_.IndexOf(';',$_.IndexOf(';')+1)+1)} `
| %{$_.Group[0]}