I am trying to implement a function in assembly that does some basic calculations using SIMD vector instructions and registers. The function signature is void map_poly_double_vec(double* input, double* output, uint64_t length, double a, double b, double c, double d);
For some reason, when I use vbroadcastsd %xmm3, %ymm6
to put the double d
argument into all of the fields of %ymm6
, the program instead inserts the double a
argument into it instead. The other vbroadcastsd
instructions work fine except for the last one. I've used GDB to try and figure out why but the instruction simply runs and uses the first double argument instead of the fourth.
Here is my assembly function (AT&T syntax):
map_poly_double_vec:
mov $0, %rcx
vbroadcastsd %xmm0, %ymm3 #a
vbroadcastsd %xmm1, %ymm4 #b
vbroadcastsd %xmm2, %ymm5 #c
vbroadcastsd %xmm3, %ymm6 #d
mpdv_loop:
cmp %rdx, %rcx
je mpdv_end
vmovupd (%rdi, %rcx, 8), %ymm0
vmovupd %ymm0, %ymm1
vfmadd132pd %ymm3, %ymm4, %ymm1
vfmadd132pd %ymm0, %ymm5, %ymm1
vfmadd132pd %ymm1, %ymm6, %ymm0
vmovupd %ymm0, (%rsi, %rcx, 8)
add $4, %rcx
jmp mpdv_loop
mpdv_end:
ret
This is what I am using to test the function.
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include "lab11.h"
double* create_array(uint64_t length) {
double* array = (double*)malloc(length * sizeof(double));
if (array == NULL) {
return NULL;
}
for (uint64_t i = 0; i < length; i++) {
array[i] = ((double)rand() / RAND_MAX - 0.25);
}
return array;
}
void print_double_array(double* array, uint64_t length) {
printf("{ ");
for (uint64_t i = 0; i < length; i++) {
printf("%.6g ", array[i]);
}
printf("}\n");
}
int main(void) {
uint64_t length = 16;
double* doubles1 = create_array(length);
double* double_out = (double*)malloc(length * sizeof(double));
printf("map_poly_double_vec result:\n");
memset(double_out, 0, length * sizeof(double));
map_poly_double_vec(doubles1, double_out, length, 4, 5, 6, 7);
print_double_array(double_out, length);
free(doubles1);
free(double_out);
return 0;
}
What I should get:
{ 13.105 7.98257 12.2256 12.4544 14.3174 6.69849 7.55013 12.0089 7.17059 9.39815 8.66996 10.2085 7.76063 9.0004 15.0642 14.3989 }
What I get with my function:
map_poly_double_vec result:
{ 10.105 4.98257 9.22559 9.45443 11.3174 3.69849 4.55013 9.00889 4.17059 6.39815 5.66996 7.20848 4.76063 6.0004 12.0642 11.3989 }
See here in the first broadcast:
vbroadcastsd %xmm0, %ymm3 #a <----
vbroadcastsd %xmm1, %ymm4 #b
vbroadcastsd %xmm2, %ymm5 #c
vbroadcastsd %xmm3, %ymm6 #d
Now ymm3
, which used to hold argument d
, has been overwritten with argument a
(broadcasted).
So vbroadcastsd %xmm3, %ymm6
picks up argument a
again.
In case this was the source of confusion: the xmm
registers are extended to ymm
registers in AVX, the ymm
registers not an independent new set of registers. Every 256-bit ymm
register can be seen as two chunks of 128 bits, the bottom 128 bits are the corresponding xmm
register, eg xmm3
is the bottom 128 bits of ymm3
.