I am currently working on a project depends on detecting which OS the executable belong to.
I am only working on ELF executable format so I tried to use e_ident[EI_OSABI]
value but doesn't give healthy result.
Also PT_INTERP
section is not eligible as a solution because it does not provide information on shared libraries (.so
) and sometimes the name of linker doesn't include the name of the kernel (Example: /lib/ld-musl-x86_64.so.1
. It doesn't contain kernel's name like /lib64/ld-linux-x86-64.so.2
.).
I thought the third way can be finding the system calls that only exist on one kernel and always added to the executables.
If we give an example to make it understandable:
foo
and this system call is always used every Linux executable and Linux shared librarybar
and this system call is always used every FreeBSD executable and FreeBSD shared librarybaz
and this system call is always used every OpenBSD executable and OpenBSD shared libraryIf I find one of foo
, bar
, baz
system calls in the executable I can detect OS of it.
My question is:
foo
is always exist on .dsym
section)Unfortunately, syscalls are not named, they're numbered. For example, on Linux x86-64, to call read(2)
, you use syscall 0, and to call write(2)
you make syscall 1. However, on FreeBSD, those are syscalls 3 and 4, respectively. There is usually a header, sys/syscall.h
, that provides the values.
In addition, you have the joy that syscalls can vary among architectures, at least on Linux, and some architectures have syscalls that others do not. For example, there are 32-bit and 64-bit versions of stat
on 32-bit x86, but since the 32-bit version (which can only handle files up to 2^31-1 bytes) is obsolete and not useful, x86-64 didn't bother to implement such a thing, and the stat
system call is always 64-bit on x86-64.
Furthermore, usually syscalls are located in libc, since usually one makes calls to a C function and then libc has all of the knowledge about what syscall number corresponds to what. Some operating systems, like Linux, allow users to make direct syscalls from their binary; however, others, like OpenBSD, do not, and the kernel will murder your process if you try. Thus, in most cases, an executable itself does not contain any actual syscalls.
If your goal is to detect binaries, you're going to require a multi-pronged approach, since no single approach is going to suffice. First, when the OS ABI in the ELF header is not SysV, it's usually correct. That's a good way to detect FreeBSD, for example. (However, You should also use PT_INTERP
, which will provide suitable context as well. It's true that musl's ld.so doesn't contain linux
, in its name, but you know that if it's musl, it's Linux.
You may also want to look at the libc value if it's an executable. Sometimes PT_INTERP
may be generic (e.g., on FreeBSD, it's the ever so helpful /libexec/ld-elf.so.1
), but you may be able to distinguish OSes from their libc version. Some systems have versioned symbols, so looking at the version symbols can be helpful. Many systems also have a special note section (e.g., MirBSD has .note.miros.ident
).
If the binary is static, you're not going to have libc or PT_INTERP
, so you may need to look some more. Static Go binaries have a .go.buildinfo
section that contains a GOOS=
value (e.g., GOOS=linux
) that you can use.
However, fortunately, in most cases, file
can provide this information for you just by running it on the binary. However, not in all cases (MirBSD is a good example), so you're really going to have to fall back to some more complicated spelunking if you want to handle all ELF binaries.