For example, when I have a complex view with the only thing actually changing is the caret, I don't want to redraw the whole scene just to update the caret.
For now the only reasonable way I can finger out to do this is to cache the content without the cursor. This doesn't seems to be a too bad one, but I have to choose between always render to texture or decide whether to render to texture or not all the time.
Maybe this problem can be generalized to "the right way to handle an almost-static complex scene with GPU".
My experience from working on a few games is to generally render the whole scene again. The cases where it is prohibitively expensive to re-render the resource every frame you implement the caching yourself. e.g. You cache a shadow map for a dynamic light until the light moves again.
The caching solution you described is what an automatic cache would have to anyways, so its not unreasonable. What concern do you have with rendering the whole scene again?