Difference between memory consumption of processes and total container memory consumption

I have NET 6 app in container running in OpenShift. When I query top inside container I see total memory consumption about 2GB. However, when I open Grafana to see what a container actually requests, I see it's close to 5GB. This app allocates lots of big objects, so I suppose it happens because of memory fragmentation and allocations. Is that true and what tools exist for investigating this kind of issues?

Solution

I don't consider myself an expert on this topic, but since this question has been quiet for almost a week, let me give it an attempt. Others can feel to correct or update this if I make any errors.

The OpenShift metric that is getting gathered there is container_memory_working_set_bytes (or at least that's the default, you can see this by clicking "Show PromQL".) Whereas I expect that you were looking at memory RSS in top. There are lots of explanations of what the difference between working set and RSS is, and I'm not enough of a kernel expert to be definitive about all of the nuances. But my layman's explanation is that working set also will include dirty memory and other memory allocations that are not currently reclaimable.

So, that's the answer to your first question about why you are seeing a difference. It's surprising to me that the difference is so large, but since you talk about allocating lots of big objects, so perhaps that is related.

To your second question of "what tools exist for investigating these kinds of issues", essentially you can troubleshoot with your typical Linux tools. Taking a quick look at /proc/meminfo inside the container will give you a high level picture. If that's not enough you can use standard Linux tools like ps and vmstat (although some of those tools won't exist in stripped down container images, you may need to adjust your base image).

You could also try monitoring some additional metrics in addition to container_memory_working_set_bytes. The PromQL has code completion, so you can just explore the available memory metrics and see if you can find something meaningful to you to monitor. That includes container_memory_rss, which probably matches what you were seeing in top. But container_memory_working_set_bytes is probably the important one, since that's the one that will get a container killed for exceeding.

But that all being said, the graph image you included looks to have a healthy pattern to it: memory seems to be growing and getting reclaimed as needed. Unless you are seeing some undesirable behavior, no action might be needed.

Here are some links with more detail, although most are links to the Red Hat Portal, so you will likely need a Red Hat subscription to view them.

Difference between memory usage on Metrics and Dashboard

How is working set calculated

https://blog.freshtracks.io/a-deep-dive-into-kubernetes-metrics-part-3-container-resource-metrics-361c5ee46e66