Search code examples
antivirusvirusheuristics

How do the Antivirus programs detect the EICAR Test Virus?


The EICAR test virus is used to test the functionality of the anti virus programs. In order to detect it as a virus,

Should the antivirus program have the virus definition for the test virus

OR

The heuristics detect it as a suspicious pattern and detect it as a virus.

(I have seen an occasion that an AV program deletes the file while downloading but without identifying the virus as EICAR test virus. Just as a suspicious object--> i.e If it has the definition it should identify the virus name, details etc Isn't it?)


Solution

  • IMHO, the point of the test virus is to have something that is both known to be harmless, and accepted as a virus so that end users can verify that the AV software is turned on, and can see the effect of a virus identification. Think fire drill, for AV software.

    I would imagine that most have a signature for it, and directly recognize it as such.

    I wouldn't be surprised if the bit pattern of the actual EICAR test happened to include bit patterns that smelled like opcodes for suspicious activity, but I don't know if that is the case. If it is, then it might be valid test of a simple heuristic virus recognizer. However, since the EICAR test has been around for a long time, I would also imagine that any heuristic that caches it isn't good enough to catch anything now in the wild.

    I wouldn't expect that recognizing EICAR is proof of any claim stronger than "the AV is installed and scanning what it was expected to scan", and if developing an AV system, I wouldn't attempt to make any stronger claim about it.

    Update:

    The actual EICAR test virus is the the following string:

    X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
    

    which was carefully crafted (according to the Wikipedia article) to have a couple of interesting properties.

    First, it consists of only printable ASCII characters. It will often include whitespace and/or a newline at the end, but that has no effect on its recognition, or on its function.

    Which raises the second property: it is in fact an executable program for an 8086 CPU. It can be saved (via Notepad, for example) in a file with the extension .COM, and it can be run on MSDOS, most clones of MSDOS, and even in the MSDOS compatibility mode of the Windows command prompt (including on Vista, but not on any 64-bit Windows since they decided that compatibility with 16-bit real mode was no longer a priority.)

    When run, it produces as output the string "EICAR-STANDARD-ANTIVIRUS-TEST-FILE!" and then exits.

    Why did they go to this effort? Apparently the researchers wanted a program that was known to be safe to run, in part so that live scanners could be tested without needing to capture a real virus and risk a real infection. They also wanted it to be easy to distribute by both conventional and unconventional means. Since it turns out that there is a useful subset of the x86 real-mode instruction set where every byte meets the restriction that it also be a printable ASCII character, they achieved both goals.

    The wiki article has a link to a blow-by-blow explanation of how the program actually works which is also an interesting read. Adding to the complexity is the fact that the only way to either print to the console or exit a program in DOS real mode is to issue a software interrupt instruction, whose opcode (0xCD) is not a printable 7-bit ASCII character. Furthermore, the two interrupts each require a one byte immediate parameter, one of which would need to be a space character. Since the self-imposed rule was to not allow spaces, all four of the last bytes of the program ("H+H*" in the string) are modified in place before the instruction pointer gets there to execute them.

    Disassembling and dumping EICAR.COM with the DEBUG command at a command prompt on my XP box, I see:

    0C32:0100 58            POP     AX
    0C32:0101 354F21        XOR     AX,214F
    0C32:0104 50            PUSH    AX
    0C32:0105 254041        AND     AX,4140
    0C32:0108 50            PUSH    AX
    0C32:0109 5B            POP     BX
    0C32:010A 345C          XOR     AL,5C
    0C32:010C 50            PUSH    AX
    0C32:010D 5A            POP     DX
    0C32:010E 58            POP     AX
    0C32:010F 353428        XOR     AX,2834
    0C32:0112 50            PUSH    AX
    0C32:0113 5E            POP     SI
    0C32:0114 2937          SUB     [BX],SI
    0C32:0116 43            INC     BX
    0C32:0117 43            INC     BX
    0C32:0118 2937          SUB     [BX],SI
    0C32:011A 7D24          JGE     0140
    
    0C32:0110                                      45 49 43 41               EICA
    0C32:0120  52 2D 53 54 41 4E 44 41-52 44 2D 41 4E 54 49 56   R-STANDARD-ANTIV
    0C32:0130  49 52 55 53 2D 54 45 53-54 2D 46 49 4C 45 21 24   IRUS-TEST-FILE!$
    
    0C32:0140 48            DEC     AX
    0C32:0141 2B482A        SUB     CX,[BX+SI+2A]
    

    After executing instructions up to JGE 0140, the last two instructions have been modified to be:

    0C32:0140 CD21          INT     21
    0C32:0142 CD20          INT     20
    

    Most DOS system calls were dispatched through INT 21 with the value of the AH or AX register specifying the function to execute. In this case, AH is 0x09, which is the print string function, which prints the string starting at offset 0x011C, terminated by the dollar sign. (You had to print a dollar sign with a different trick in pure DOS.) The INT 20 call terminates the process before any extra bytes past that point can be executed.

    Self-modifying code was an early virus trick, but here it is used to preserve the restriction on byte values that can be used in the string. In a modern system, it is possible that the data execution protection feature would catch the modification, if that is enforced on MSDOS compatibility mode running a COM file.