Search code examples
solarisshaintrinsicssolaris-studio

Which xarch for SHA extensions on Solaris?


Oracle released Sun Studio 12.6 recently. We have a SHA-1 and SHA-256 intrinsic based implementation (for ARM and Intel), and we want to enable the extension on Solaris i86 machines.

The 12.6 manual and -xarch options is available at A.2.115.3 -xarch Flags for x86, but it does not discuss SHA.

Which -xarch option do we use for SHA?


Solution

  • If Studio 12.6 doesn't support the SHA instruction set (and I strongly suspect it doesn't since I can't find "SHA" mentioned at all, in any form, in the What's New in the Oracle Developer Studio 12.6 Release documentation), you're out of luck.

    Almost.

    What you can do is create your own inline assembler functions. See man inline:

    inline(4)

    Name

    inline, filename.il - Assembly language inline template files

    Description

    Assembly language call instructions are replaced by a copy of their corresponding function body obtained from the inline template (*.il) file.

    Inline template files have a suffix of .il, for example:

    % CC foo.il hello.c
    

    Inlining is done by the compiler's code generator.

    ...

    Examples

    Please review libm.il or vis.il for examples. You can find a version of these libraries that is specific to each supported architecture under the compiler's lib/ directory.

    ...

    An example can be found here (emphasis mine):

    Performance Tuning With Sun Studio Compilers and Inline Assembly Code

    ...

    This paper provides a demonstration of how to measure the performance of a critical piece of code. An example using a compiler flag and another example using inline assembly code are provided. The results are compared to show the benefits and differences of each approach.

    ...

    Example 8: Inline Assembly Code for the Iterative Mandelbrot Calculation

    Knowing all these facts, the inline code can be written, as shown in Example 8.

    .inline mandel_il,0
    // x is stored in %xmm0
    // y is stored in %xmm1
    // 4.0 is stored in %xmm2
    // max_int is stored in %rdi
    
    // set registers to zero
      xorps %xmm3, %xmm3
      xorps %xmm4, %xmm4
      xorps %xmm5, %xmm5
      xorps %xmm6, %xmm6
      xorps %xmm7, %xmm7
      xorq %rax, %rax
    
    .loop:
    // check to see if u2 - v2 > 4.0
      movss %xmm5, %xmm7
      addss %xmm6, %xmm7
      ucomiss %xmm2, %xmm7
      jp     .exit
      jae    .exit
    
    // v = 2 * v * u + y
      mulss %xmm3, %xmm4
      addss %xmm4, %xmm4
      addss %xmm1, %xmm4
    // u = u2 - v2 + x
      movss %xmm5, %xmm3
      subss %xmm6, %xmm3
      addss %xmm0, %xmm3
    // u2 = u * u
      movss %xmm3, %xmm5
      mulss %xmm3, %xmm5
    // v2 = v * v
      movss %xmm4, %xmm6
      mulss %xmm4, %xmm6
    
      incl %eax
      cmpl %edi, %eax
      jl .loop
    
    .exit:
    // end of mandel_il
    .end
    

    It's not hard at all. I had to write a lot of SPARC inline assembler functions for a customer I was consulting for back in the Solaris 8 days, some of them were pretty basic - effectively one-liners to wrap a single instruction. I swear some of them wound up in later versions of the Studio compiler suite (since we were sub-contracted by Sun itself, that's not surprising, nevermind the fact that some of them were blatantly obvious - floor() and ceil(), IIRC, were two of them - and should have been there in the first place...)