Search code examples
perlsplitwhitespaceremoving-whitespace

Split() on newline AND space characters?


I want to split() a string on both newlines and space characters:

#!/usr/bin/perl
use warnings;
use strict;

my $str = "aa bb cc\ndd ee ff";
my @arr = split(/\s\n/, $str);     # Split on ' ' and '\n'
print join("\n", @arr);            # Print array, one element per line

Output is this:

aa bb cc
dd ee ff

But, what I want is this:

aa
bb
cc
dd
ee
ff

So my code is splitting on the newline (good) but not the spaces. According to perldoc, whitespace should be matched with \s in a character class, and I would have assumed that is whitespace. Am I missing something?


Solution

  • my code is splitting on the newline (good)

    Your code is not splitting on newline; it only seems that way due to how you are printing things. Your array contains one element, not two. The element has a newline in the middle of it, and you are simply printing aa bb cc\ndd ee ff.

    \s\n means: any whitespace followed by newline, where whitespace actually includes \n.

    Change:

    my @arr = split(/\s\n/, $str);
    

    to:

    my @arr = split(/\s/, $str);
    

    Using Data::Dumper makes it clear that the array now has 6 elements:

    use warnings;
    use strict;
    use Data::Dumper; 
    
    my $str = "aa bb cc\ndd ee ff";
    my @arr = split(/\s/, $str);
    print Dumper(\@arr);
    

    Prints:

    $VAR1 = [
              'aa',
              'bb',
              'cc',
              'dd',
              'ee',
              'ff'
            ];
    

    The above code works on the input string you provided. It is also common to split on multiple consecutive whitespaces using:

    my @arr = split(/\s+/, $str);