Search code examples
regexperl

Perl regex - remove all characters except alphanumeric characters and comma


I have the following code:

my $str = 'Uploaded 07-02▒05:14, Size 212.14▒MiB, ULed by someone';
print "Pre:".$str."\n";
my $str =~ s/^[a-zA-z0-9,]//g;
print "Post:".$str."\n";

My aim was to remove those special characters and spaces so that I could split the string for further processing.

With the regex above, I was trying to remove all characters except alphanumeric characters and comma. Unfortunately I am getting a blank line. I'm a beginner to regex and would like to know what is wrong with my expression.


Solution

  • You have three errors conspiring to break your program. If you had use strict and use warnings at the top of your code as you should have then Perl would have printed messages to alert you

    • You have declared a second $str, which is therefore undef and is printed as an empty string

    • You have the caret outside the character class, so it is acting as a start-of-string anchor instead of negating the class

    • You have [a-zA-z0-9] as your character class. A-z includes the characters [, \, ], ^, _, and ` as well as the upper and lower case alphabet. You need [a-zA-Z0-9] instead

    Here is some working code. Your text string contains a Unicode character U+2592 Medium Shade so I've had to use utf8 to mark the code as being encoded in UTF-8, and use open to set STDOUT to accept UTF-8 encoding

    use utf8;
    use strict;
    use warnings;
    
    use open qw/ :std :encoding(utf-8) /;
    
    my $str = 'Uploaded 07-02▒05:14, Size 212.14▒MiB, ULed by someone';
    
    print "Pre: $str\n";
    
    $str =~ s/[^a-zA-Z0-9,]//g;
    
    print "Post: $str\n";
    

    output

    Pre: Uploaded 07-02▒05:14, Size 212.14▒MiB, ULed by someone
    Post: Uploaded07020514,Size21214MiB,ULedbysomeone