Search code examples
arraysregexperlextracthostname

Perl regex to extract machine name from hostname


I am using Perl v5.10 on CentOS 6.8

My program reads a list of host names into Perl array @aVmList. I am trying to extract only the machine name from each of them.

Some of the host names are fully qualified, some are not. Some contain dashes or underscores.

I have no control over the contents of the array.

Here is an example of the data I'm working with.

my @aVmList = qw(
    vmserver1.domain.com
    vmserver2
    vm-server-three.otherdomain.com
    server_four.domain.com
    server5
    server6
    some-silly-vm-name
    another_server.maybewithadomain.com
);

I would like to extract only the machine name from each element, ending up with the following.

vmserver1 
vmserver2
vm-server-three 
server_four 
server5
server6
some-silly-vm-name
another_server

I found the regex /(.*?)\./ which almost works, but only when all of the names are fully qualified.

foreach ( @aVmList ) {

    $_ =~ /(.*?)\./;

    my $sVmName = $1;

    print $sVmName;
}

I thought I needed to use a look-behind for the dots. I came up with the following

$_ =~ /([A-Za-z0-9-_]+)(?!=\.)/;

which seemed to work in the regex tester, but when I ran my Perl script it still matched the whole string.

I don't like the path I'm headed down with the regex pattern above, because now I'm assuming that the host names will only contain "word" characters or a hyphen.

I know I shouldn't have to account for special characters in host names, but I'm trying to base the regex pattern on matching anything before the first dot in a domain name suffix.something.com.

I also found Regular expression to extract hostname from fully qualified domain name which sounded like what I wanted, but neither of the suggestions from there seemed to work.

I tried:

$_ =~ (.+?)(?=\.)

and

$_ =~ ^([^.]+)\..*$

Solution

  • The negated character class [^...] matches any character except those listed. Then

    my ($name) = $_ =~ /([^.]+)/;
    

    matches all characters up to the first . and stops at it, thus there is no reason to explicitly match the dot (nor the rest of the line). The match is captured and assigned to $name.


    When the match operator is used in the list context it returns the list of all matches

    my @matches = $var =~ m/$pattern/g;
    

    Even if there is only one match we need the list context so that the match is returned, thus the parenthesis in my ($name) = ..., to impose the list context on the match operator. In the above example this is done by assigning to an array. Otherwise we'd have the scalar context, in which case the match operator behaves differently. See this in perlop and see perlretut.

    The m above may be omitted and most often is. But note that this is not always the case, for example when different delimeters are used. I suggest a good read through perlretut.

    The default input and pattern-searching space ($_) in your loop holds the currently processed element. Regex by default works with $_ so $_ need not be specified. See General Variables in perlvar, and see a regex-related comment in the perlop link. So you can do

    foreach (@vm_list) {
        /([^.]+)/;           # OK but better assign directly from the match
        my $host_name = $1;
    } 
    

    However, it is clearer to assign directly from the match, as in the answer.