Search code examples
perlippacket

How do I print a computed score for all input in a file?


Here is some Perl code which takes two files as input. The files contain TCP packets. It trains itself for the normal packets using the packets in first file and then prints the anomalous packets in the second file.

while (<>) {
    if (($time, $to, $port, $from, $duration, $flags, $length, $text) = /(.{19}) (.{15}):(\d+) (.{15}):\d+ \+(\d+) (\S+) (\d+) (.*)/) {
        $text =~ s/\^M//g;
        $text =~ s/\^ /\n/g;
        if (($port == 25 || $port == 80) && $text =~ /\n\n/) {$text = "$`\n";}
        $text =~ s/^\^@//;
        if ($time =~ /(\d\d)\/(\d\d)\/\d\d\d\d (\d\d):(\d\d):(\d\d)/) {
            $now = ((($1 * 31 + $2) * 24 + $3) * 60 + $4) * 60 + $5;
        }
        foreach ($text =~ /.*\n/g) {
            if (($k, $v) = /(\S*)(.*)/) {
                $k = substr($k, 0, 30);
                $v = substr($v, 0, 100);
                $score   = 0;
                $comment = "";
                &alarm($port,       $k);
                &alarm($to,         $flags);
                &alarm("To",        "$to:$port");
                &alarm($to,         $from);
                &alarm("$to:$port", $from);
                if ($score > 30000) {
                    $score = log($score) / (10 * log(10));
                    printf("    #   0 $time $to %8.6f \#%s\n", $score, substr($comment, 0, 300));
                }
            }
        }
    }
}

sub alarm {
    local ($key, $val, $sc) = @_;
    if ($now < 10300000) {
        ++$n{$key};
        if (++$v{$key . $val} == 1) {
            ++$r{$key};
            $t{$key} = $now;
        }
    } elsif ($n{$key} > 0 && !$v{$key . $val}) {
        $score += ($now - $t{$key}) * $n{$key} / $r{$key};
        $comment .= " $key=$val";
        $t{$key} = $now;
    }
}

exit;

I am new to Perl and as a small part my project it needs that anomaly score is to be printed for all the packets in the second file. Can anybody tell how to modify the code?


Solution

  • From what I can see here, it looks as if the code (as it is now) looks for packets before some cutoff time, and stores whether or not it has seen certain conditions in the %n and %v hashes.

    Why not give an extra flag to your alarm function called $training. If true, just account for the packet values, otherwise, calculate a score for this anomaly (if it is one), and return that value. If there is no anomaly, or if you're in training mode, just return zero:

        sub alarm {
             my ($key, $val, $training) = @_;
             my $score = 0;
             if ( $training ) {
                 ...do your accounting...
             } else {
                 ...do your comparisons & set score accordingly...
             }
             return $score;
         }
    

    Throw your big while into a subroutine, and have that subroutine take a filename and whether it is in training mode or not.

         sub examine {
             my ($file, $training) = @_;
             if ( open my $fh, '<', $file ) {
                 while (<$fh>) {
                     ...this is your big while loop...
                     ...pass $training along to your alarm() calls...
                 }
             } else {
                 die "Failed to open $file: $!\n';
             }
         }
    

    Your main program is now:

         use constant TRAINING => 1;
    
         examine('file1',  TRAINING);
         examine('file2', !TRAINING);
    

    More notes:

    • Use my() instead of local, though it doesn't materially affect this program, it's a good habit to get into.
    • Don't use a well known function name alarm when it really isn't doing anything of the kind, instead name it something like check_packet_values -- or something that makes sense to you and your team.
    • Stop using magic numbers

      use constant {
          CUTOFF_TIME   => 10300000,
          ANOMALY_SCORE =>    30000
      };
      
    • Use a real date/time parser so that your values have some meaning. str2time from Date::Parse would give you your time in epoch seconds (seconds since Jan 1, 1970).

    • Use variable names that mean something. %n and %v are hard to understand in this code, but %n_seen and %value_seen (as well as %first_seen_time instead of %t). Remember, your code doesn't run faster if you use shorter variable names.
    • Stop using global variables when feasible. The counters can be global, but your comment should be built only in the routine which is initializing and printing the comment. So, instead of doing what you are doing, how about:

      $to_score = check_packet_value($to, $flags)
          and push @comments, "$to=$flags";
      ...
      $score = $to_score + $from_score + ...
      if ( !$training && $score > ANOMALY_THRESHOLD ) {
          print "blah blah blah @comments\n";
      }
      
    • Also, never, ever use $` -- it causes huge performance penalties in your entire script (even if it never calls this function). Instead of:

      if ( $text =~ /\n\n/ ) { $text = $` }
      

    Use

        if ( $text =~ /(.*)\n\n/ ) {
            $text = $1;
        }
    

    (Edit: added warning about $`)