PTX - difference between .local and .param

I'm studying PTX and I don't understand the difference between .param and .local state spaces.

.local are variables visible to threads and stored on their stack (which is, by the way, thread memory)

.param are variables used for object allocation (if passing by value), function parameters and return values and input parameters and they're also allocated on the stack

On the PTX manual there's:

In PTX, the address of a function input parameter may be moved into a register using the mov instruction. Note that the parameter will be copied to the stack if necessary, and so the address will be in the .local state space and is accessed via ld.local and st.local instructions.

I don't understand: why copying a .param to the stack if .param == .local and everything is already on the stack?

Solution

.param is a PTX-level abstraction for data passed from the host to the device as part of a kernel invocation, i.e. these are the kernel call parameters or arguments. In early GPUs, the actual storage used for this purpose was shared memory, in later GPUs this was changed to a constant memory bank.