Search code examples
amazon-ecsebpf

Using BPF based tracing tools in ECS


Fairly recently I started learning BPF tools and have used them quite a bit on my own workstation as a debugging aid. However, I would like to start using them in production for better visibility into production issues. Our workloads run in AWS ECS. It seems like using the tools is not possible on Fargate:

https://github.com/aws/containers-roadmap/issues/1027

What would be the requirement to get the tools properly working in ECS if using your own EC2 cluster? Can I use e.g. an Alpine Linux image or would I need to base the image on the precise kernel build used in the EC2 hosts? Anyone have experiences to share?


Solution

  • Disclaimer, I don't have personal experience with eBPF on AWS ECS, however I have some experience with eBPF requirements since I maintain a loader library.

    In general to use eBPF you need:

    • A linux kernel which is compiled with BPF support and the BPF features you want to use.
    • Having the the CAP_SYS_ADMIN capability on kernel versions lower than 5.8 or the CAP_BPF capability on kernel versions 5.8 and above (CAP_SYS_ADMIN will still work, but gives you much more than just BPF access).
    • Depending on which tools you want to use you might need extra capabilities like CAP_PERFMON to use perf features(uprobe, kprobe, tracepoint) (or CAP_SYS_ADMIN on kernel versions below 5.8)

    Using eBPF within a container should not be an issue since containers share the same kernel with the host(containers are just isolated processes on the host).

    But since eBPF allows you to probe the kernel this obviously breaks the isolation of the container, and giving CAP_SYS_ADMIN to a container also basically gives it full root access, so security is a challenge(unless you are just using it in development, in which case you can just make your container privileged). That is the reason you won't see eBPF enabled on shared hardware(if things are configured properly).