Search code examples
arraysperlawkforeachequivalent

Load field 1 and print at the END{} equivalent awk in Perl


I have the following AWK script that counts occurences of elements in field 1 and when finishes to read entire file, prints each element and the times of repetitions.

awk '{a[$1]++} END{ for(i in a){print i"-->"a[i]} }' file

I'm very new with perl and I don't know how would be the equivalent. What I have so far is below, but it has incorrect syntax. Thanks in advance.

perl -lane '$a{$F[1]}++ END{foreach $a {print $a} }' file

____________________________________UPDATE ______________________________________

Hi, thanks both for your answers. The real input file has 34 million lines and the execution time is 3 or more times faster between awk and Perl. Is awk faster than perl?

awk '{a[$1]++}END{for(i in a){print i"-->"a[i]}}' file #--> 2:45 aprox
perl -lane '$a{$F[0]}++;END{foreach my $k (keys %a){ print "$k --> $a{$k}" } }' file #--> 7 min aprox
perl -lanE'$a{$F[0]}++; END { say "$_ => $a{$_}" for keys %a }' file # -->9 min aprox

Solution

  • Okay, Ger, one more time :-) I upgraded my Perl to the latest version available to me and made a file like what you described (34.5 million lines each having a 16 digit integer in the 1st and only column):

    schumack@linux2 52> wc -l listbig
    34521909 listbig
    
    schumack@linux2 53> head -3 listbig
    1111111111111111
    3333333333333333
    4444444444444444
    

    I then ran a specialized Perl line (works for this file but is not the same as the awk line). As before I timed the runs using /usr/bin/time:

    schumack@linux2 54> /usr/bin/time -f '%E %P' /usr/local/bin/perl -lne 'chomp; $a{$_}++; END{foreach $i (keys %a){print "$i-->$a{$i}"}}' listbig
    5555555555555555-->4547796
    1111111111111111-->9715747
    9999999999999999-->826872
    3333333333333333-->9922465
    1212121212121212-->826872
    4444444444444444-->5374669
    2222222222222222-->1653744
    8888888888888888-->826872
    7777777777777777-->826872
    0:12.20 99%
    
    schumack@linux2 55> /usr/bin/time -f '%E %P' awk '{a[$1]++} END{ for(i in a){print i"-->"a[i]} }' listbig
    1111111111111111-->9715747
    2222222222222222-->1653744
    3333333333333333-->9922465
    4444444444444444-->5374669
    5555555555555555-->4547796
    1212121212121212-->826872
    7777777777777777-->826872
    8888888888888888-->826872
    9999999999999999-->826872
    0:12.61 99%
    

    Both perl and awk ran very fast on the 34.5 million line file and were within a half second of each other. Curious as what type of machine / OS / Perl version you are currently using. I tested on an ASUS laptop that is about 4 years old, has Intel I7. I am using Ubuntu 16.04 and Perl v5.26.1

    Anyways, thanks for the reason to play around with Perl!

    Have fun, Ken