Why does the compiler copy RDI to another register, and then copy it back to RDI inside a loop?

I'm analysing a piece of inefficient code, but some of it is so confusing?

Original code:

#include <string.h>

void lowwer(char *str) {
  for (int i = 0; i < strlen(str); ++i) {
    str[i] -= ('A' - 'a');
  }
}

Assembly code (generated by clang 13 with -Og option):

lowwer:
  pushq %r14 # use saved-registers
  pushq %rbx
  pushq %rax
  # guard do while
  cmpb  $0, (%rdi) # compare &str with null (check if strlen(str) == 0)
  je    .LBB0_3
  # loop initialization
  movq  %rdi, %r14 # %r14 = str
  xorl  %ebx, %ebx # clear %rbx (for more compact encoding)
.LBB0_2:                                # =>This Inner Loop Header: Depth=1
  addb  $32, (%r14,%rbx) # subtract -32 from str[i] ('A' - 'a' = -32)
  addq  $1, %rbx # ++i
  movq  %r14, %rdi # seems meaningless here?
  callq strlen@PLT
  cmpq  %rbx, %rax # check i < strlen(str)
  ja    .LBB0_2
.LBB0_3: # end
  addq  $8, %rsp # ???
  popq  %rbx # free registers
  popq  %r14
  retq

what does the instruction movq %r14, %rdi is doing? It seemed meangingless because %r14 holding the string pointer and the rdi is the same.
What the intention of the instruction addq $8, %rsp. Looks clueless.

Solution

rdi is a caller-saved register and is hence trashed by the call to strlen. To preserve its contents, the compiler emitted code to move its contents to r14, copying it back once every iteration as an argument to strlen.

The addq $8, %rsp instruction releases stack space previously allocated by pushq %rax. This stack space was allocated to satisfy the stack alignment requirements imposed by the amd64 SysV ABI.

Refer to the amd64 SysV ABI supplement for the full calling convention and a list of caller/callee saved registers.