bash summing and grouping a column

hey guys so i have this dummy data :

 id  | game_title     | demography    | store|sold(in k)| game_id|
+----+----------------+---------------+------+----------+--------+
|  3 | final fantasy  | 3             | QLD  | 5        | 115    |
|  4 | final fantasy  | 3             | NSW  | 3        | 115    |
|  5 | tekken         | 2             | QLD  | 2        | 101    |
|  6 | tekken         | 2             | QLD  | 1.5      | 101    |
|  7 | sfv            | 1             | NSW  | 1        | 100    |
|  8 | final fantasy  | 3             | QLD  | 2        | 115    |
|  9 | fifa           | 1             | QLD  | 12       | 102    |
+----+----------------+---------------+------+----------+--------+

The desired result shown below:

id   | game_title     | sold total in k|
+----+----------------+----------------|
|  3 | final fantasy  | 10             |   
|  4 | tekken         | 3.5            |    
|  5 | sfv            | 1              |     
|  6 | fifa           | 12             |      
+----+----------------+----------------+

The game_id will be used as a group since some title does have a same title, how would i do this using bash ?

i can select the game_title, sold and game_id by using

awk -F "," '{print $2 $5 $6}' data.csv

however i'm not too sure how to sum then grouping it.

Solution

Could you please try following.

Solution 1st: This will give you output in same order in which your Input_file's 2nd field is present.

awk '
BEGIN{
   FS="|"
   OFS="| "
}
FNR==1{
   print
   next
}
/--/ && !val{
   val=$0
   next
}
!c[$3]++{
   d[++count]=$3
}
{
   a[$3]+=$6
   if(!b[$3]){
     b[$3]=$2 OFS $3
   }
}
END{
   print val
   for(i=1;i<=count;i++){
      if(a[d[i]] && d[i]){
        print OFS b[d[i]],a[d[i]]
      }
   }
   print val
}'  Input_file

Output will be as follows.

 id  | game_title     | demography    | store|sold(in k)| game_id|
+----+----------------+---------------+------+----------+--------+
|   3 |  final fantasy  | 10
|   5 |  tekken         | 3.5
|   7 |  sfv            | 1
|   9 |  fifa           | 12
+----+----------------+---------------+------+----------+--------+

Solution 2nd: In case you are NOT worried about order of 2nd column in output then following may help you.

awk '
BEGIN{
   FS="|"
   OFS="| "
}
FNR==1{
   print
   next
}
/--/ && !val{
   val=$0
   next
}
{
   a[$3]+=$6
   if(!b[$3]){
     b[$3]=$2 OFS $3
   }
}
END{
   print val
   for(i in a){
     if(a[i] && b[i]){
       print OFS b[i],a[i]
     }
   }
   print val
}'   Input_file

Output will be as follows.

 id  | game_title     | demography    | store|sold(in k)| game_id|
+----+----------------+---------------+------+----------+--------+
|   7 |  sfv            | 1
|   3 |  final fantasy  | 10
|   5 |  tekken         | 3.5
|   9 |  fifa           | 12
+----+----------------+---------------+------+----------+--------+