Search code examples
perlperl-data-structurescsv

Comparing 2 columns of 2 different .tsv files and write the output in a new tsv file


I am kinda stuck on a quick script that I wanted to adapt so I thought asking might help me moving forward :)

My first tsv file is a list of servers / hostnames that looks like this (groups.tsv):

Groupname | Servername

group1      server1     
group2      server2     
group3      server3 server4 server5
group4      server6     
group5      server7     
group6      server7

Second one is a list of IPs / hostnames (addresses.tsv):

Hostname | IP | Purpose

server1    192.168.1.29 255.255.255.255    application1
server2    172.16.9.38 255.255.255.255  
server3    10.50.110.28 255.255.255.255 
server4    10.0.0.1 255.255.255.255        application2
server5    192.168.1.1 255.255.255.255  
server6    172.16.32.32 255.255.255.255 
server7    10.0.0.3 255.255.255.255 
server8    1.1.1.1 255.255.255.255         application3
server9    127.0.0.1 255.255.255.255    
server10   10.28.28.28 255.255.255.255  

With a perl script found elsewhere I am trying to obtain a new tsv file like this:

Hostname | IP | Purpose | Groupname

server1    192.168.1.29 255.255.255.255    application1     group1
...

The perl script is the following:

#!/usr/bin/perl
use strict;
use warnings;
my %programs;    #Hash to save groupname => hostname for each program record
open my $fh1, '<', 'groups.tsv' or die "Failed to open groups.tsv: $!";
foreach (<$fh1>) {
    chomp;       #Remove newline character from end of line
    my ( $groupname, $hostname ) = split(/\t/);
    $programs{$groupname} = $hostname;
}
close $fh1;
open my $fh2, '<', 'addresses.tsv' or die "Failed to open addresses.tsv: $!";
open my $fh3, '>', 'result.tsv'    or die "Failed to open result.tsv: $!";      #Output
foreach (<$fh2>) {
    chomp;    #Remove newline character from end of line
    my $groupname = ( split(/\t/) )[1];    #Groupname
    if ( exists $programs{$groupname} ) {
        print $fh3 "$_\t$programs{$groupname}\n";
    } else {
        print $fh3 "$_\t*NA*\n";
    }
}

At some point I have partly retrieved groupnames but it was because I have it confused with inverted key / values. I believe I would rather have to check on the values of the hash instead of the key, but I got no success so far.

For now I am having this result.tsv:

server1 192.168.1.29 255.255.255.255    application1    *NA*
server2 172.16.9.38 255.255.255.255     *NA*
server3 10.50.110.28 255.255.255.255    *NA*
....

In the same spirit, I will then have a policy.tsv

Groupname | Service   
group1  ALL
group2  HTTP
group3  HTTP HTTPS TCP_3389
group3 group4   SSH

So my final goal would be to have a final tsv file like this:

Hostname | IP | Purpose | Groupname | Service

server1    192.168.1.29 255.255.255.255    application1     group1      ALL
...

http://www46.zippyshare.com/v/30272792/file.html

But I assume it might be easier to adapt as soon as the first part is OK.

Anybody clear with perl / hash could put me on a lead ?

Thanks in advance for the help!


Solution

  • Your hash is backwards. You want to lookup group name by host name. Change

    $programs{$groupname} = $hostname;
    

    to

    $groupname{$_} = $groupname for split ' ', $hostname;
    

    Then the bottom part becomes

    my $hostname = ( split /\t/ )[0];
    print $fh3 $_ . "\t" . ( $groupname{$hostname} || '*NA*' ) . "\n";