RET x versus ADD RSP, x in x86-64 assembly

I am writing a program in MASM64.
I use WinAPI a lot.
I don't use the push and pop instructions, I use mov [rsp + x] instead.
I don't use local variables.
I don't use prolog/epilog.
I don't use RBP at all.
I do use sub rsp, x to preserve a shadow space and keep the stack 16 bytes aligned.

What is the difference between ending a procedure with ret x vs add rsp, x? I understand they both add value to RSP to clean up the stack. Any performance difference?
I guess ret x would be faster, since after add rsp, x there will be a ret anyway.

Solution

Let's say on the stack we have 55, 44, 33, 22, 11 (top to bottom) and we have 64-bit integers.

In this case, ret 0x20 would return to address 55 and then remove the other four values from the stack, while add rsp, 0x20; ret would first remove four values from the stack and then return to address 11.

In both cases, a total of five values gets removed from the stack, but the return address is different in the two cases.

Since you said that you use this for local variables for which you created space using sub rsp, x from within your function, you'll need the latter case (add rsp, x; ret). The other case is for removing function arguments which the caller pushed before calling the function.

Neither mainstream x86-64 calling convention uses a callee-pops convention, so x86-64 code uses plain ret, leaving the stack args allocated. The caller can reuse that space for another call.

A few 32-bit conventions are callee-pops, like stdcall and fastcall, using ret n. i386 System V and cdecl are caller-pops conventions.