Search code examples
awkbackticksevaluate

Evaluating command with Awk


The problem is that: I have different txt files in which is registered a timestamp and an ip address for every malware packet that arrives to a server. What I want to do is create another txt file that shows, for every ip, the first time a malware packet arrives.

In general I want to do something like this :

for every  line in file.txt
 if (ip is not present in list.txt)
 copy timestamp and ip in list.txt

I'm using awk for doing it. The main problem is the "if ip is not present in list.txt". I'm doing this:

 {    a=$( grep -w "$3" list.txt | wc -c );
    if ( a == 0 )
   {
     #copy timestamp and ip in list.txt
   }

( i'm using $3 because the ip address is in the third column of the source file )

I don't know how to make awk evaluate the grep function. I've tried with backticks also but it didn't work. Someone could give me some hint?

I'm testing my script on test file like this:

10  192.168.1.1
11  192.168.1.2
12  192.165.2.4
13  122.11.22.11    
13  192.168.1.1
13  192.168.1.2
13  122.11.22.11
14  122.11.22.11
15  122.11.22.11
15  122.11.22.144
15  122.11.2.11
15  122.11.22.111

What should I obtain is:

10  192.168.1.1
11  192.168.1.2
12  192.165.2.4
13  122.11.22.11    
15  122.11.22.144
15  122.11.2.11
15  122.11.22.111

Thanks to your help I've succeded in creating the script that fits my needs :

awk '
FILENAME == ARGV[1] {
    ip[$2] = 1
    next
}
! ($2 in ip) {
    print $1, $2 >> ARGV[1]
    ip[$2] = 1
}
' list.txt file.txt 

Solution

  • But really what you want to do is get awk to read the list.txt file first, then process the other file with the list.txt data in memory. This will allow you to avoid calling system() for each line.

    I assume the ip is in the 1st column of list.txt.

    When you say copy timestamp and ip in list.txt, I assume you want to append some info from the current line of file.txt to the list.txt file.

    awk '
        FILENAME == ARGV[1] {
            ip[$1] = 1
            next
        }
        ! ($3 in ip) {
            print $3, $(whatevever_column_holds_timestamp) >> ARGV[1]
        }
    ' list.txt file.txt
    

    Given the sample file and simplified requirements of your question update:

    awk '! seen[$2]++' filename
    

    will produce the results you've seen. That awk program will print the line if the IP has not yet been seen.