I am using Perl v5.10 on CentOS 6.8
My program reads a list of host names into Perl array @aVmList
. I am trying to extract only the machine name from each of them.
Some of the host names are fully qualified, some are not. Some contain dashes or underscores.
I have no control over the contents of the array.
Here is an example of the data I'm working with.
my @aVmList = qw(
vmserver1.domain.com
vmserver2
vm-server-three.otherdomain.com
server_four.domain.com
server5
server6
some-silly-vm-name
another_server.maybewithadomain.com
);
I would like to extract only the machine name from each element, ending up with the following.
vmserver1
vmserver2
vm-server-three
server_four
server5
server6
some-silly-vm-name
another_server
I found the regex /(.*?)\./
which almost works, but only when all of the names are fully qualified.
foreach ( @aVmList ) {
$_ =~ /(.*?)\./;
my $sVmName = $1;
print $sVmName;
}
I thought I needed to use a look-behind for the dots. I came up with the following
$_ =~ /([A-Za-z0-9-_]+)(?!=\.)/;
which seemed to work in the regex tester, but when I ran my Perl script it still matched the whole string.
I don't like the path I'm headed down with the regex pattern above, because now I'm assuming that the host names will only contain "word" characters or a hyphen.
I know I shouldn't have to account for special characters in host names, but I'm trying to base the regex pattern on matching anything before the first dot in a domain name suffix.something.com
.
I also found Regular expression to extract hostname from fully qualified domain name which sounded like what I wanted, but neither of the suggestions from there seemed to work.
I tried:
$_ =~ (.+?)(?=\.)
and
$_ =~ ^([^.]+)\..*$
The negated character class [^...]
matches any character except those listed. Then
my ($name) = $_ =~ /([^.]+)/;
matches all characters up to the first .
and stops at it, thus there is no reason to explicitly match the dot (nor the rest of the line). The match is captured and assigned to $name
.
When the match operator is used in the list context it returns the list of all matches
my @matches = $var =~ m/$pattern/g;
Even if there is only one match we need the list context so that the match is returned, thus the parenthesis in my ($name) = ...
, to impose the list context on the match operator. In the above example this is done by assigning to an array. Otherwise we'd have the scalar context, in which case the match operator behaves differently. See this in perlop and see perlretut.
The m
above may be omitted and most often is. But note that this is not always the case, for example when different delimeters are used. I suggest a good read through perlretut
.
The default input and pattern-searching space ($_
) in your loop holds the currently processed element. Regex by default works with $_
so $_
need not be specified. See General Variables in perlvar, and see a regex-related comment in the perlop
link. So you can do
foreach (@vm_list) {
/([^.]+)/; # OK but better assign directly from the match
my $host_name = $1;
}
However, it is clearer to assign directly from the match, as in the answer.