Search code examples
gccpowerpc

Avoid extra load after library call in optimized code


Implementing a datastructure with two function pointers, key_eq and key_hash. When these function pointers are set only once and never modified, I'd like gcc to compile in the function address directly instead of loading the struct member, however a single puts("") call will break gcc's constant propagation (correct term?).

Below is a short snippet of ppc asm. What is seen is that the field key_hash is stored, we call libc puts and then key_hash is loaded. If puts is removed, the constant is correctly propagated. Is this unavoidable?

    lis 9,hasheq_string@ha   # tmp174,                                       
    la 0,hasheq_string@l(9)  # tmp173,, tmp174                               
    stw 3,16(1)      # h.flags,                                              
    lis 9,hashf_string@ha    # tmp176,                                       
    lis 3,.LC0@ha    # tmp178,                                               
    stw 0,36(1)      # h.key_eq, tmp173                                      
    la 0,hashf_string@l(9)   # tmp175,, tmp176                               
    la 3,.LC0@l(3)   #,, tmp178                                              
    stw 0,40(1)      # h.key_hash, tmp175                                    
    bl puts  #                                                               
    lwz 0,40(1)      # h.key_hash, h.key_hash                                
    mr 3,31  #, tmp164                                                       
    stw 31,32(1)     # h._key, tmp164                                        
    mtctr 0  #, h.key_hash                                                   
    bctrl    #   

I'm sorry for not having a C testcase to go with it, I have a hard time reproducing it, yet the short three-line "store, bl puts, load" sequence leads the question, is this impossible to work around?


Solution

  • It's possible that GCC conservatively assumes that the call to puts may read/write memory, hence the value in key_hash may be needed by the called function and different before and after the call.

    This code:

    struct S
    {
      int f;
    };
    
    void bar ();
    
    int
    foo (struct S *s)
    {
      s->f = 314;
      bar ();
      return s->f + 2;
    }
    

    produces the "signature" sequence:

    foo:
        mflr 0
        stwu 1,-32(1)
        stw 29,20(1)
        mr 29,3
        stw 0,36(1)
        li 0,314
        stw 0,0(3)  ;;
        bl bar      ;; here we go
        lwz 3,0(29) ;;
        lwz 0,36(1)
        addi 3,3,2
        lwz 29,20(1)
        mtlr 0
        addi 1,1,32
        blr
    

    However, with a small modification (manual Scalar Replacement of Aggregates?):

    struct S
    {
      int f;
    };
    
    void bar ();
    
    int
    foo (struct S *s)
    {
      int i;
    
      s->f = i = 314;
      bar ();
      return i + 2;
    }
    

    one gets the desired result:

    foo:
        mflr 0
        stwu 1,-16(1)
        stw 0,20(1)
        li 0,314
        stw 0,0(3)
        bl bar
        lwz 0,20(1)
        li 3,316    ;; constant propagated and folded
        addi 1,1,16
        mtlr 0
        blr