Search code examples
bashcsvawksumcut

bash summing and grouping a column


hey guys so i have this dummy data :

 id  | game_title     | demography    | store|sold(in k)| game_id|
+----+----------------+---------------+------+----------+--------+
|  3 | final fantasy  | 3             | QLD  | 5        | 115    |
|  4 | final fantasy  | 3             | NSW  | 3        | 115    |
|  5 | tekken         | 2             | QLD  | 2        | 101    |
|  6 | tekken         | 2             | QLD  | 1.5      | 101    |
|  7 | sfv            | 1             | NSW  | 1        | 100    |
|  8 | final fantasy  | 3             | QLD  | 2        | 115    |
|  9 | fifa           | 1             | QLD  | 12       | 102    |
+----+----------------+---------------+------+----------+--------+

The desired result shown below:

id   | game_title     | sold total in k|
+----+----------------+----------------|
|  3 | final fantasy  | 10             |   
|  4 | tekken         | 3.5            |    
|  5 | sfv            | 1              |     
|  6 | fifa           | 12             |      
+----+----------------+----------------+

The game_id will be used as a group since some title does have a same title, how would i do this using bash ?

i can select the game_title, sold and game_id by using

awk -F "," '{print $2 $5 $6}' data.csv 

however i'm not too sure how to sum then grouping it.


Solution

  • Could you please try following.

    Solution 1st: This will give you output in same order in which your Input_file's 2nd field is present.

    awk '
    BEGIN{
       FS="|"
       OFS="| "
    }
    FNR==1{
       print
       next
    }
    /--/ && !val{
       val=$0
       next
    }
    !c[$3]++{
       d[++count]=$3
    }
    {
       a[$3]+=$6
       if(!b[$3]){
         b[$3]=$2 OFS $3
       }
    }
    END{
       print val
       for(i=1;i<=count;i++){
          if(a[d[i]] && d[i]){
            print OFS b[d[i]],a[d[i]]
          }
       }
       print val
    }'  Input_file
    

    Output will be as follows.

     id  | game_title     | demography    | store|sold(in k)| game_id|
    +----+----------------+---------------+------+----------+--------+
    |   3 |  final fantasy  | 10
    |   5 |  tekken         | 3.5
    |   7 |  sfv            | 1
    |   9 |  fifa           | 12
    +----+----------------+---------------+------+----------+--------+
    

    Solution 2nd: In case you are NOT worried about order of 2nd column in output then following may help you.

    awk '
    BEGIN{
       FS="|"
       OFS="| "
    }
    FNR==1{
       print
       next
    }
    /--/ && !val{
       val=$0
       next
    }
    {
       a[$3]+=$6
       if(!b[$3]){
         b[$3]=$2 OFS $3
       }
    }
    END{
       print val
       for(i in a){
         if(a[i] && b[i]){
           print OFS b[i],a[i]
         }
       }
       print val
    }'   Input_file
    

    Output will be as follows.

     id  | game_title     | demography    | store|sold(in k)| game_id|
    +----+----------------+---------------+------+----------+--------+
    |   7 |  sfv            | 1
    |   3 |  final fantasy  | 10
    |   5 |  tekken         | 3.5
    |   9 |  fifa           | 12
    +----+----------------+---------------+------+----------+--------+