Search code examples
powershellselectunique

Further Specifying Select -unique in PowerShell


I'm using PowerShell to reformat a large database. The raw data consists of device tag names and data points. The original file repeats the tags for each data point. Additionally, there are many similar (not identical) tags with identical data points. Here is a sample of the original:

40028 "ALBENI F1" "1 "     0.00   -14.00    12.10
40028 "ALBENI F1" "1 "     2.70   -13.90    11.80
40028 "ALBENI F1" "1 "     4.50   -13.80    11.60
40028 "ALBENI F1" "1 "     7.30   -13.60    11.10
40028 "ALBENI F1" "1 "    10.00   -12.70    10.40
40028 "ALBENI F1" "1 "    12.30   -11.80     9.70
40028 "ALBENI F1" "1 "    12.70   -11.30     9.50
40028 "ALBENI F1" "1 "    14.50    -9.40     8.70
40028 "ALBENI F1" "1 "    16.30    -7.40     7.80
40028 "ALBENI F1" "1 "    16.80    -6.80     6.90
40028 "ALBENI F1" "1 "    17.20    -5.50     5.30
40028 "ALBENI F1" "1 "    17.80    -3.50     3.50
40028 "ALBENI F1" "1 "    18.20     0.00     0.00
40030 "ALBENI F2" "2 "     0.00   -14.00    12.10

In Powershell, I've managed to get it looking like so:

40028 "ALBENI F1 " "1 " "YES"
 , 0.00, -14.00, 12.10,
 , 2.70, -13.90, 11.80,
 , 4.50, -13.80, 11.60,
 , 7.30, -13.60, 11.10,
 , 10.00, -12.70, 10.40,
 , 12.30, -11.80, 9.70,
 , 12.70, -11.30, 9.50,
 , 14.50, -9.40, 8.70,
 , 16.30, -7.40, 7.80,
 , 16.80, -6.80, 6.90,
 , 17.20, -5.50, 5.30,
 , 17.80, -3.50, 3.50,
 , 18.20, 0.00, 0.00,
  40063 "CGS " "1 " "YES"
 , 0.00, -620.00, 680.00,

This is the format I want. As you can see, I turned all the tags into single headers, and got rid of ALBENI F2, since it is a repeat of F1. My problem is that to get there, I used select-object -unique. This got rid of all the duplicate headers and repeat data sets, as needed. However, it also deleted repeat data points on other, completely different tags. This is unacceptable; so, I need a way to get rid of duplicate tags and data sets, while keeping all data points on unique tags, even if those data points are identical to those from other tags.

If it helps, here is my code:

Get-Content (inputfile)|select -skip 2| select-string '}' -NotMatch |
    %{$_ -replace '"\s+(\-?\d+\.\d+)\s+(\-?\d+\.\d+)\s+(\-?\d+\.\d+)','"" ,, $1, $2, $3,'}| 
    %{$_ -split '"\s+,'} |
    select -unique |
    %{$_ -replace '"\s+("\w+")', ' " $1 "YES"'}| 
    %{$_ -replace '"\s+("\w+\s+")', ' " $1 "YES"'} |(outputfile)

There's a lot in there, and it all works like it should. I just need another method of sorting out duplicates. Ideas?


Solution

  • Ok, so I took a completely different approach; I first split the file into headers and data. I then filtered the headers with get-unique, leaving the data alone. Then I split the data into sets and inserted the appropriate headers in each spot. This got rid of all the extra headers, and put the whole file into the format I needed. My entire code is below.

    [cmdletbinding()]
    Param(
    [Parameter(mandatory=$true,position=1)]
    [string]$InputFilePath,
    [Parameter(mandatory=$true,position=2)]
    [string]$OutputFilePath
    )
    
    Get-Content $InputFilePath |select -skip 2| select-string '}' -NotMatch|%{$_ -replace '"\s+(\-?\d+\.\d+)\s+(\-?\d+\.\d+)\s+(\-?\d+\.\d+)','"" ,, $1, $2, $3,'}| %{$_ -split '"\s+,'} |%{$_ -replace '"\s+("\w+")', ' " $1 "YES"'}| %{$_ -replace '"\s+("\w+\s+")', ' " $1 "YES"'}|out-file $OutputFilePath
    $data=Get-Content $OutputFilePath| select-string ',' 
    $data=$data|%{$_ -replace '(,\s+0.00,\s+\-?\d+\.\d+\,\s+\d+\.\d+)',':$1'}| %{$_ -split ':'}
    $headers=Get-Content $OutputFilePath| select-string '"' | Get-Unique
    $counter=0
    $data | %{if($_.length -eq 0){$_ -replace '', ($headers|Select-Object -index $counter) ;$counter=$counter+1 }else{$_} }|out-file $OutputFilePath
    

    This is my full code, but the important part for this problem is where I start assigning variables. Thanks for all the help guys!