what is the use of vertical pod autoscaler "auto" mode

As far as I understand from the VPA documentation the vertical pod autoscaler stop/restart the pod based-on the predicted request/limit's lower/upper bounds and target. In the "auto" mode it says that the pod will be stopped and restarted, however, I don't get the point of doing a prediction and restarting the pod while it is still working because although we know that it might go out of resource eventually it is still working and we can wait to rescale it once it has really gone out of memory/cpu. Isn't it more efficient to just wait for the pod to go out of memory/cpu and then restart it with the new predicted request?

Is recovering from a dead container more costly than stopping and restarting the pod ourselves? If yes, in what ways?

Solution

Isn't it more efficient to just wait for the pod to go out of memory/cpu and then restart it with the new predicted request?

In my opinion this is not the best solution. If the pod would try to use more CPU than available limits than the container's CPU use is being throttled, if the container is trying to use more memory than limits kubernetes OOM kills the container due to limit overcommit but limit on npods usually can be higher than sum of node capacity so this can lead to memory exhaust in the node and can case the death of other workload/pods.

Answering your question - VPA was designed to simplify those scenarios:

Vertical Pod Autoscaler (VPA) frees the users from necessity of setting up-to-date resource limits and requests for the containers in their pods. When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial containers configuration.

In addition VPA should is not only responsible for scaling up but also for scaling down: it can both down-scale pods that are over-requesting resources, and also up-scale pods that are under-requesting resources based on their usage over time.

Is recovering from a dead container more costly than stopping and restarting the pod ourselves? If yes, in what ways?

Talking about the cost of recovering from the dead container - the main possible cost might be requests that can eventually get lost during OOM killing process as per the official doc.

As per the official documentation VPAs operates in those mode:

"Auto": VPA assigns resource requests on pod creation as well as updates them on existing pods using the preferred update mechanism Currently this is equivalent to "Recrete".

"Recreate": VPA assigns resource requests on pod creation as well as updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation (respecting the Pod Disruption Budget, if defined).

"Initial": VPA only assigns resource requests on pod creation and never changes them later.

"Off": VPA does not automatically change resource requirements of the pods.

NOTE: VPA Limitations

VPA recommendation might exceed available resources, such as you cluster capacity or your team’s quota. Not enough available resources may cause pods to go pending.
VPA in Auto or Recreate mode won’t evict pods with one replica as this would cause disruption.
Quick memory growth might cause the container to be out of memory killed. As out of memory killed pods aren’t rescheduled, VPA won’t apply new resource.

Please also take a look at some of the VPA Known limitations:

Updating running pods is an experimental feature of VPA. Whenever VPA updates the pod resources the pod is recreated, which causes all running containers to be restarted. The pod may be recreated on a different node.

VPA does not evict pods which are not run under a controller. For such pods Auto mode is currently equivalent to Initial.

VPA reacts to most out-of-memory events, but not in all situations.

Additional resources: VERTICAL POD AUTOSCALING: THE DEFINITIVE GUIDE