Search code examples
bashshellsortingawkunique

Duplicated rows must be unique


My file content is as following. I want to first columns values' sum. But 5th columns are not unique and it has not got every second. It should be unique and if there is no any second it is not important. Important that is UNIQUE.

 18 /traffic-2.log00980-####<Aug 7, 2016 11:37:34 PM EEST
 13 /traffic-2.log00980-####<Aug 7, 2016 11:37:37 PM EEST
 10 /traffic-2.log00980-####<Aug 7, 2016 11:37:38 PM EEST
 11 /traffic-2.log00980-####<Aug 7, 2016 11:37:39 PM EEST
 18 /traffic-2.log00980-####<Aug 7, 2016 11:37:40 PM EEST
 12 /traffic-2.log00980-####<Aug 7, 2016 11:37:41 PM EEST
 10 /traffic-2.log00980-####<Aug 7, 2016 11:37:42 PM EEST
 18 /traffic-2.log00980-####<Aug 7, 2016 11:37:43 PM EEST
 11 /traffic-2.log00980-####<Aug 7, 2016 11:37:44 PM EEST
 13 /traffic-2.log00980-####<Aug 7, 2016 11:37:45 PM EEST
 18 /traffic-2.log00980-####<Aug 7, 2016 11:37:43 PM EEST
 11 /traffic-2.log00980-####<Aug 7, 2016 11:37:44 PM EEST
 13 /traffic-2.log00980-####<Aug 7, 2016 11:37:45 PM EEST
 12 /traffic-2.log00980-####<Aug 7, 2016 11:37:46 PM EEST
 13 /traffic-2.log00980-####<Aug 7, 2016 11:37:47 PM EEST
 11 /traffic-2.log00980-####<Aug 7, 2016 11:37:48 PM EEST
 17 /traffic-2.log00980-####<Aug 7, 2016 11:37:49 PM EEST
 12 /traffic-2.log00980-####<Aug 7, 2016 11:37:50 PM EEST
 13 /traffic-2.log00980-####<Aug 7, 2016 11:37:51 PM EEST
  9 /traffic-2.log00980-####<Aug 7, 2016 11:37:54 PM EEST
  9 /traffic-2.log00980-####<Aug 7, 2016 11:37:55 PM EEST
 13 /traffic-2.log00980-####<Aug 7, 2016 11:37:56 PM EEST
 12 /traffic-2.log00980-####<Aug 7, 2016 11:37:57 PM EEST
 11 /traffic-2.log00980-####<Aug 7, 2016 11:37:58 PM EEST
  7 /traffic-2.log00980-####<Aug 7, 2016 11:37:59 PM EEST
 10 /traffic-2.log00980-####<Aug 7, 2016 11:38:00 PM EEST
 10 /traffic-2.log00980-####<Aug 7, 2016 11:38:01 PM EEST
  9 /traffic-2.log00980-####<Aug 7, 2016 11:37:55 PM EEST
 13 /traffic-2.log00980-####<Aug 7, 2016 11:37:56 PM EEST
 12 /traffic-2.log00980-####<Aug 7, 2016 11:37:57 PM EEST
 11 /traffic-2.log00980-####<Aug 7, 2016 11:37:58 PM EEST
  7 /traffic-2.log00980-####<Aug 7, 2016 11:37:59 PM EEST
 10 /traffic-2.log00980-####<Aug 7, 2016 11:38:00 PM EEST
 10 /traffic-2.log00980-####<Aug 7, 2016 11:38:01 PM EEST
 10 /traffic-2.log00980-####<Aug 7, 2016 11:38:02 PM EEST
 15 /traffic-2.log00980-####<Aug 7, 2016 11:38:03 PM EEST
 13 /traffic-2.log00980-####<Aug 7, 2016 11:38:04 PM EEST

Solution

  • by awk;

    awk  '!seen[$5]++' yourFile | awk '{ sum+=$1} END {print sum}'
    

    -first awk delete duplicate on 5th columns

    -second is sum of the first fields.