Search code examples
perl

Switch on length of line in Perl script


I want to use switch/case construct in Perl. I have a file that contains a sequence of words, I want to do different treatment for each line according to the number of words that the line contains.

An example file:

w1 w2 w2
w1 w3

So the script will look something like this, but how do I calculate the number of word in each line?

given ($number_of_word_in_line) {
   when ($_ > 2) {
       ...
   }
   when ($_ > 3) {
       ...
   }
   default {
       ...
   }
}

Solution

  • Please be careful with the switch statement which is highly experimental

    As previously mentioned, the "switch" feature is considered highly experimental; it is subject to change with little notice. In particular, when has tricky behaviours that are expected to change to become less tricky in the future. Do not rely upon its current (mis)implementation. Before Perl 5.18, given also had tricky behaviours that you should still beware of if your code must run on older versions of Perl.

    These are tricky and will change.

    Having said that, one way to count words in a string is to split it first

    use warnings;
    use strict;
    use feature 'switch';
    
    my $file = '...';
    open my $fh, '<', $file  or die "Can't open $file: $!";
    
    while (my $line = <$fh>)
    {
        chomp $line;
        my @words = split ' ', $line;
        my $num_words = @words;
        
        given ($num_words) {
            when ($num_words > 2) { 
                # ...
            }
        }
    }
    close $fh;
    

    what uses the fact that a scalar ($num_words) when assigned an array (@words) receives the number of elements of the array. See Context in perldata

    Assignment is a little bit special in that it uses its left argument to determine the context for the right argument. Assignment to a scalar evaluates the right-hand side in scalar context, [...]

    and an array evaluated in scalar context yields the number of its elements.

    Here we can skip the array altogether

    my $num_words = split ' ', $line;
    

    So in order to get the count without creating an array variable we need to directly assign to a scalar, but that isn't always going to yield the length of the list; putting the right-hand-side in scalar context -- by assignment to a scalar -- may affect how it operates and what it returns.

    There are workarounds though. For example

    my $num_words = () = $line =~ /\w+/g;
    

    where the "operator" = () = is a play on context, or

    my $num_words = @{ [ $line =~ /\w+/g ] };
    

    where the [] takes a reference to the list inside and is then derefenced by @{ }, what just evaluates to a list regardless of context and so can be assigned to a scalar whereby such scalar assignment returns the number of elements in that list.§

    See this page for a wealth of information about lists, arrays, scalars, and context.


    This can be done more compactly as

    while (<$fh>) {
        chomp;
        my $num_words = split;
        # ...
    }
    

    The default for while, chomp, and split is the $_ variable. The split also needs a pattern and the default is ' ', so the above is the same as split ' ', $_. The pattern ' ' is special for split and matches any amount of any whitespace, also discarding leading and trailing space.

    Note that once we assign to a variable inside the while condition (like to the $line in the main text) then the deal with $_ is off -- it is undef. So either our variable or $_. A reasonable rule of thumb is that if you end up using $_ more than once or twice then there should be a proper variable. And if ever in doubt, introduce a nice variable.

    Regex's match operator returns the actual matches when in list context but only true/false when in scalar context. (And, in scalar context that /g doesn't make sense.)

    § Another example is split, which returns the size of the list in scalar context.