Search code examples
regexperlpcre

Allow PosixPrint Characters except , % \ / # ? : and except whitespace at the begining and end of string


So For this Allow PosixPrint Characters except , % \ / # ? : condition is working

with this regex pattern m/^[^\P{PosixPrint}\/\#\%\?\:\,\\]+$/x

But for this:

white-space at the beginning and end but allow in the middle

this pattern m/^\b[^\P{PosixPrint}\/\#\%\?\:\,\\]+\b$/x is kind of working (See the output).

It is not matching string if any characters appear at beginning and end except [0-9a-zA-Z].

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my $vars = [
    q#1#,
    q#1~`!l#,
    q#11#,
    q#111#,
    q#1 1#,
    q# 11#,
    q#11 #,
    q# 11 #,
    q# 1 1 #,
    q#1`~!@$^&*()-_=+|]}[{;'".><1#,
    q#1`~!@$^&*()-_=1#,
    q#1~`!@$^&*()-_=+|]}[{;'".><#,
    q#~`!@$^&*()-_=+|]}[{;'".><1#,
    q#~`!@$^&*()-_=+|]}[{;'".><#,
];

foreach my $var (@$vars){
    if ( $var =~ m/^\b[^\P{PosixPrint}\/\#\%\?\:\,\\]+\b$/x) {
        print "match:\t\t#$var#\n";
    }
    else{
        print "no match:\t#$var#\n";
    }
}

OUTPUT:

    match:      #1#
    match:      #1~`!l#
    match:      #11#
    match:      #111#
    match:      #1 1#
    no match:   # 11#
    no match:   #11 #
    no match:   # 11 #
    no match:   # 1 1 #
    match:      #1`~!@$^&*()-_=+|]}[{;'".><1#
    match:      #1`~!@$^&*()-_=1#
    no match:   #1~`!@$^&*()-_=+|]}[{;'".><#
    no match:   #~`!@$^&*()-_=+|]}[{;'".><1#
    no match:   #~`!@$^&*()-_=+|]}[{;'".><#

Expected OUTPUT:

    match:      #1#
    match:      #1~`!l#
    match:      #11#
    match:      #111#
    match:      #1 1#
    no match:   # 11#
    no match:   #11 #
    no match:   # 11 #
    no match:   # 1 1 #
    match:      #1`~!@$^&*()-_=+|]}[{;'".><1#
    match:      #1`~!@$^&*()-_=1#
    match:      #1~`!@$^&*()-_=+|]}[{;'".><#
    match:      #~`!@$^&*()-_=+|]}[{;'".><1#
    match:      #~`!@$^&*()-_=+|]}[{;'".><#

Information:

Perl Version: v5.26.2
Platform: Ubuntu 18.10

Solution

  • \b is a word boundary, it is a boundary between a word character and a non word character.

    Beginning and end of line are considered as non word character, so, \b at the end or at the beginning of a line will "match" only if there is a word character at first (last) char.

    As far as I understand you want to reject lines that begin and/or end with space, use:

    my $vars = [
        q#1#,
        q#1~`!l#,
        q#11#,
        q#111#,
        q#1 1#,
        q# 11#,
        q#11 #,
        q# 11 #,
        q# 1 1 #,
        q#1`~!@$^&*()-_=+|]}[{;'".><1#,
        q#1`~!@$^&*()-_=1#,
        q#1~`!@$^&*()-_=+|]}[{;'".><#,
        q#~`!@$^&*()-_=+|]}[{;'".><1#,
        q#~`!@$^&*()-_=+|]}[{;'".><#,
    ];
    
    foreach my $var (@$vars){
        if ( $var =~ m/^(?!\h)[^\P{PosixPrint}\/\#\%\?\:\,\\]+(?<!\h)$/x) {
        #               ^^^^^^                                ^^^^^^^
            print "match:\t\t#$var#\n";
        }
        else{
            print "no match:\t#$var#\n";
        }
    }
    

    Where

    • (?!\h) is a negative lookahead that make sure we haven't a horizontal space at first position
    • (?<!\h) is a negative lookbehind that make sure we haven't a horizontal space at last position

    Output:

    match:      #1#
    match:      #1~`!l#
    match:      #11#
    match:      #111#
    match:      #1 1#
    no match:   # 11#
    no match:   #11 #
    no match:   # 11 #
    no match:   # 1 1 #
    match:      #1`~!@$^&*()-_=+|]}[{;'".><1#
    match:      #1`~!@$^&*()-_=1#
    match:      #1~`!@$^&*()-_=+|]}[{;'".><#
    match:      #~`!@$^&*()-_=+|]}[{;'".><1#
    match:      #~`!@$^&*()-_=+|]}[{;'".><#