I have data in a CSV file as below...
Emailid Storeid
a@gmail.com 2000
b@gmail.com 2001
c@gmail.com 2000
d@gmail.com 2000
e@gmail.com 2001
I am expecting below output, basically finding out how many email ids are there for each store.
StoreID Emailcount
2000 3
2001 2
So far i tried to solve my issue
IFS=","
while read f1 f2
do
awk -F, '{ A[$1]+=$2 } END { OFS=","; for (x in A) print x,A[x]; }' > /home/ec2-user/storewiseemials.csv
done < temp4.csv
With the above shell script i am not getting desired output, Can you guys please help me?
Using miller (https://github.com/johnkerl/miller) and starting from this (I have used a CSV, because I do not know if you use a tab or a white space as separator)
Emailid,Storeid
a@gmail.com,2000
b@gmail.com,2001
c@gmail.com,2000
d@gmail.com,2000
e@gmail.com,2001
and running
mlr --csv count-distinct -f Storeid -o Emailcount input >output
you will have
+---------+------------+
| Storeid | Emailcount |
+---------+------------+
| 2000 | 3 |
| 2001 | 2 |
+---------+------------+