How would I write a malware detection software?

What are the resources I need to go through to fully understand how this works? I've looked up online but all I got are software solutions rather than how the software actually detects them.

I want to be able to detect a malware that's in my computer. Let's say there's a trojan horse in my Computer. How would I write a program to detect it?

I'm a beginner at Information Security.

Thanks in advance.

Solution

Well most endpoint security products have: - an on-demand scanning component. - a real-time scanning component. - hooks into other areas of the OS to inspect data before "released". E.g. Network layer for network born threats. - a detection engine - includes file extractors - detection data that can be updated. - Run time scanning elements.

There are many layers and components that all need to work together to increase protection.

Here are a few scenarios that would need to be covered by a security product.

Malicious file on disk - static scanning - on-demand. Maybe the example you have. A command line/on-demand scanner would enumerate each directory and file based on what was requested to be scanned. The process would read files and pass streams of data to a detection engine. Depending on the scanning settings configured and exclusions, files would be checked. The engine can understand/unpack the files to check types. Most have a file type detection component. It could just look at the first few bytes of a file as per this - https://linux.die.net/man/5/magic. Quite basic but it gives you an idea on how you can classify a file type before carrying out more classifications. It could be as simple as a few check-sums of a file at different offsets.

In the case of your example of a trojan file. Assuming you are your own virus lab, maybe you have seen the file before, analyzed it an written detection for it as you know it is malicious. Maybe you can just checksum part of the file and publish this data to you product. So you have virusdata.dat, in it you might have a checksum and a name for it. E.g. 123456789, Troj-1 You then have a scanning process, that loads your virus data file at startup and opens the file for scanning. You scanner checksums the file as per the lab scenario and you get a match with the data file. You display the name as it was labelled. This is the most basic example really and not that practical bu hopefully it serves some purpose. Of course you will see the problem with false positives.

Other aspects of a product include:

A process writing a malicious file to disk - real-time. In order to "see" file access in real-time and get in that stack you would want a file system filter driver. A file system mini filter for example on Windows: https://msdn.microsoft.com/en-us/windows/hardware/drivers/ifs/file-system-minifilter-drivers. This will guarantee that you get access to the file before it's read/written. You can then scan the file before it's written or read by the process to give you a chance to deny access and alert. Note in this scenario you are blocking until you come to a decision whether to allow or block the access. It is for this reason that on-access security products can slow down file I/O. They typically have a number of scanning threads that a driver can pass work to. If all threads are busy scanning then you have a bit of an issue. You need to handle things like zip bombs, etc and bail out before tying up a scanning engine/CPU/Memory etc...
A browser downloading a malicious file.
You could reply on the on-access scanner preventing a file hitting the disk by the browser process but then the browsers can render scripts before hitting the file system. As a result you might want to create a component to intercept traffic before the web browser. There are a few possibilities here. Do you target specific browsers with plugins or do you go down a level and intercept the traffic with a local proxy process. Options include hooking the network layer with a Layered Service Provider (LSP) or WFP (https://msdn.microsoft.com/en-us/windows/hardware/drivers/network/windows-filtering-platform-callout-drivers2). Here you can redirect traffic to an in or out of process proxy to examine the traffic. SSL traffic poses an issue here unless you're going to crack open the pipe again more work.
Then there is run-time protection, where you don't detect a file with a signature but you apply rules to check behavior. For example a process that creates a start-up registry location for itself might be treated as suspicious. Maybe not enough to block the file on it's own but what if the file didn't have a valid signature, the location was the user's temp location. It's created by AutoIt and doesn't have a file version. All of these properties can give weight to the decision of if it should be run and these form the proprietary data of the security vendor and are constantly being refined. Maybe you start detecting applications as suspicious and give the user a warning so they can authorize them.

This is a massively huge topic that touches so many layers. Hopefully this is the sort of thing you had in mind.