Search code examples
simdadaintrinsicsavx2gnat

How would I define the __m256i data type in Ada?


I am trying to write a library for AVX2 in Ada 2012 using the GNAT GCC compiler. I have currently defined a data type Vec_256_Integer_32 like so:

type Vector_256_Integer_32 is array (0 .. 7) of Integer_32;
pragma Pack(Vec_256_Integer_32);

Note that I have aligned the array according to the 32 byte boundary indicated in Intel's documentation of the _mm256_load_si256 intrinsic function from immintrin.h.

I would like to implement an operation that adds two of these arrays together using AVX2. The function prototype is as follows.

function Vector_256_Integer_32_Add (Left, Right : Vector_256_Integer_32) return Vector_256_Integer_32

My idea for implementing this function is to do this in three steps.

  1. Load a and b using _mm256_load_si256 into a local variable.
  2. Perform the addition operation using _mm256_add_epi32.
  3. Convert the result back into the Vec_256_Unsigned_32 type using _mm256_store_si256.

Where I am confused is how I would create the __m256i data type in Ada to hold the intermediate results. Can someone please shed some light on this? Additionally, if you see any issues with my approach, any feedback is appreciated.

I have found the definition of __m256i in GCC (located at gcc/gcc/config/i386/avxintrin.h).

typedef long long __m256i __attribute__ ((__vector_size__ (32), __may_alias__));

However, here is where I am stuck as I am not sure how I would transfer this to Ada code. I have found that the __vector_size__ attribute is documented here.


Solution

  • I figured out the answer to my question after doing more research. Thank you for your input. I am posting this so hopefully someone else can get value from this.

    Edit: I have adjusted my answer according to feedback from the commenter Peter Cordes.

    For example, if you want to define a data type of 8 32-bit signed integers, you would write

    type Vector_256_Integer_32 is array (0 .. 7) of Integer_32 with Convention => C, Alignment => 32;
    

    The function to add the two vectors together would be defined as

    function "+" (Left, Right: Vector_256_Integer_32) return Vector_256_Integer_32;
    pragma Import (Intrinsic, "+", "__builtin_ia32_paddd256");
    

    Note that I am using the GCC intrinsic, rather than the intrinsics from immintrin.h (because I am not aware how to import an intrinsic from that header file).

    The documentation of _mm256_add_epi32 states that the vpaddd instruction is used. The GCC __builtin_ia32_paddd256 appears to translate to this instruction.

    Below is an example Ada program and ads file.

    avx2.ads

    with Interfaces; use Interfaces;
    
    package AVX2 is
    
       --
       -- Type Definitions
       --
    
       -- 256-bit Vector of 32-bit Signed Integers
       type Vector_256_Integer_32 is array (0 .. 7) of Integer_32;
       for Vector_256_Integer_32'Alignment use 32;
       pragma Machine_Attribute (Vector_256_Integer_32, "vector_type");
       pragma Machine_Attribute (Vector_256_Integer_32, "may_alias");
    
       --
       -- Function Definitions
       --
    
       -- Function: 256-bit Vector Addition of 32-bit Signed Integers
       function Vector_256_Integer_32_Add
         (Left, Right : Vector_256_Integer_32) return Vector_256_Integer_32 with
         Convention    => Intrinsic, Import => True,
         External_Name => "__builtin_ia32_paddd256";
    
    end AVX2;
    

    main.adb

    with AVX2;        use AVX2;
    with Interfaces;  use Interfaces;
    with Ada.Text_IO; use Ada.Text_IO;
    
    procedure Main is
       a, b, r : Vector_256_Integer_32;
    begin
       for i in Vector_256_Integer_32'Range loop
          a (i) := 5 * (Integer_32 (i) + 5);
          b (i) := 12 * (Integer_32 (i) + 12);
       end loop;
       r := Vector_256_Integer_32_Add(a, b);
       for i in Vector_256_Integer_32'Range loop
          Put_Line
            ("r(i) = a(i) + b(i) = " & a (i)'Image & " + " & b (i)'Image & " = " &
             r (i)'Image);
       end loop;
    end Main;
    

    Here is an equivalent program in C. Note that this code has only been tested in GCC and is not necessarily the most efficient.

    #include <stdio.h>
    #include <immintrin.h>
    #include <stdint.h>
    
    int main()
    {
        __m256i ma;
        __m256i mb;
        __m256i mr;
        int32_t a[8] __attribute__((aligned(32)));
        int32_t b[8] __attribute__((aligned(32)));
        int32_t r[8] __attribute__((aligned(32)));
    
        for (int i = 0; i < 8; ++i) {
            a[i] = 5 * (i + 5);
            b[i] = 12 * (i + 12);
        }
    
        ma = _mm256_load_si256((void *const)a);
        mb = _mm256_load_si256((void *const)b);
    
        mr = _mm256_add_epi32(ma, mb);
    
        _mm256_store_si256((void *)r, mr);
    
        for (int i = 0; i < 8; ++i) {
            printf("r[i] = a[i] + b[i] = %d + %d = %d\n", a[i], b[i], r[i]);
        }
    }