Search code examples
powershellarraylistdeduplication

Powershell: Deduping an array


I have a pipe delimited flat file from which I need to deduplicate the entries based on an object, to be specific a part of file is:

"001A"|"1"|"*"||"A"|"504367667"|"1"|"2005-06-10-16.57.23.000000"|
"001A"|"1"|"*"||"A"|"504367667"|"1"|"2005-10-24-16.52.29.000000"|
"001A"|"1"|"*"||"A"|"504367667"|"1"|"2007-12-13-15.48.47.000000"|
"001A"|"1"|"*"||"A"|"504367667"|"1"|"2008-12-09-17.10.39.000000"|
"001B"|"1"|"*"||"B"|"800026800"|"1"|"2005-08-08-10.48.16.000000"|
"001C"|"1"|"*"||"C"|"490349139"|"1"|"2006-01-19-12.03.08.000000"|
"001C"|"1"|"*"||"C"|"490349139"|"1"|"2009-03-12-15.08.11.000000"|

The first field is ID and last field is a timestamp, I want to deduplicate the entries such that only the latest timestamp entry is kept for each ID. So, The output that I need should be:

"001A"|"1"|"*"||"A"|"504367667"|"1"|"2008-12-09-17.10.39.000000"|
"001B"|"1"|"*"||"B"|"800026800"|"1"|"2005-08-08-10.48.16.000000"|
"001C"|"1"|"*"||"C"|"490349139"|"1"|"2009-03-12-15.08.11.000000"|

I read the file and stored the entries in an array with distinct object names, then I tried

$inputdeduped = $inputfilearray | Sort-Object Date
$inputdeduped = $inputdeduped | Select-Object ID -Unique

hoping that once the date is sorted, get-unique cmdlet used as -unique here would either pick either the first or last of the duplicated entry in the sorted array so depending on that I would sort the date in either desc or asc order, however it doesn't and randomly picks one entry.

Please help me out guys or help me understand how the get-unique cmdlet works.


Solution

  • you can try this:

      $newInputdeduped =  $inputfilearray | sort id, date -ascending | group -Property id |
         select @{n="GroupedList"; e={ $_.group | select -first 1 }} |
         select -expa list 
    

    This is what I do with your example data after saving it as a txt file:

    > $a = Import-Csv -Header "id","n1","n2","v1","n3","n4","n5","date"  -Path .\c.txt -delimiter '|'
    
    > $a | ft -AutoSize
    
    id   n1 n2 v1 n3 n4        n5 date
    --   -- -- -- -- --        -- ----
    001A 1  *     A  504367667 1  2005-06-10-16.57.23.000000
    001A 1  *     A  504367667 1  2005-10-24-16.52.29.000000
    001A 1  *     A  504367667 1  2007-12-13-15.48.47.000000
    001A 1  *     A  504367667 1  2008-12-09-17.10.39.000000
    001B 1  *     B  800026800 1  2005-08-08-10.48.16.000000
    001C 1  *     C  490349139 1  2006-01-19-12.03.08.000000
    001C 1  *     C  490349139 1  2009-03-12-15.08.11.000000
    
    > $b = $a | sort id, date -ascending | group -Property id | select @{n="list";e={ $_.group | select -first 1  }} | select -expa list
    
        > $b | ft -AutoSize
    
    id   n1 n2 v1 n3 n4        n5 date
    --   -- -- -- -- --        -- ----
    001C 1  *     C  490349139 1  2009-03-12-15.08.11.000000
    001B 1  *     B  800026800 1  2005-08-08-10.48.16.000000
    001A 1  *     A  504367667 1  2008-12-09-17.10.39.000000