how to sum two matrices element by element?

I am new to assembly and I would be grateful if you would help me with a piece of code on how to add two matrices and move the result in another matrix, in assembly language x86-32bit. The matrices are declared as 1d arrays.

n dd 9
A dd 1,2,3,4,5,6,7,8,9
B dd 2,0,4,5,6,7,0,1,3
sum dd dup 9(0)

I tried the code below, but it only works for matrices declared like that and I would need one to work for a matrices declared as 1s array.

A db 1,2,3
   db 4,5,6
B db 7,8,9
   db 10,11,12
.code
start:
mov eax , 0 
mov esi, 0 
mov ebx, 0 

add al, A[ebx][esi]
add al, B[ebx][esi]
mov A[ebx][esi], al
mov al, 0
inc esi
add al, A[ebx][esi]
add al, B[ebx][esi]
mov A[ebx][esi], al
mov al, 0
inc esi
add al, A[ebx][esi]
add al, B[ebx][esi]
mov A[ebx][esi], al

mov al, 0
mov esi, 0
add ebx, 3 
add al, A[ebx][esi]
add al, B[ebx][esi]
mov A[ebx][esi], al
mov al, 0
inc esi
add al, A[ebx][esi]
add al, B[ebx][esi]
mov A[ebx][esi], al
mov al, 0
inc esi
add al, A[ebx][esi]
add al, B[ebx][esi]
mov A[ebx][esi], al
 push 0
call exit
end start

Solution

Matrices that are contiguous in memory (like C 2D arrays) are equivalent to 1D arrays, just rows * cols elements in a row in memory, regardless of what asm syntax you use to put them there. The only thing that makes them a 2D matrix is how you index them, e.g.
flat_index row * width + col.

(And for looping over it, you can of course do row_offset += width; that's the add ebx, 3 in your 2x3 byte matrix code.)

Per-element addition of matrices doesn't have to care about their dimensions at all, it's exactly the same problem as per-element array addition. So just loop an index or pointer over each array and add.

Then you don't need 2 separate indices for row vs. column that's just going to make your code more complicated, or (for such small dimensions) almost worth fully unrolling like you did the 2nd time.

(Or if your CPU supports SSE2, you can do it 4 dwords at a time with paddd.)

This is not special:

A db 1,2,3
   db 4,5,6

Declaring like this, with 2 separate db lines for separate rows, is equivalent to one long array. For MASM, it might change the SIZEOF A (you probably only get the first row that's actually on the same line as the A label), but nothing else changes.

The reason the code that goes with it won't work for your case is that it uses byte elements, and has a different matrix size (9 elements instead of 6). Nothing to do with how it's declared.

You could fully unroll a loop and do a bunch of complicated moving and adding of integer registers if you wanted to, but there's no point.

A[ebx][esi] isn't valid syntax in most(?) assemblers. If it assembles, I assume it means
A[ebx + esi]. That would be the normal way to write that.

It's not doing matrix indexing for you, that's why you still have to use byte offsets to go to the next row.

You can use stuff like A[ebx*4 + esi] if the number of columns is an assemble-time-constant power of 2 (specifically 1, 2, 4, or 8; x86 addressing modes have 2-bit shift count for the index).

Normally in asm syntax you write [base + index*scale], but Intel-syntax assemblers don't actually care which order the components of an addressing mode appear in. So if you like to think in C, where the left index strides over whole rows to select a column, then writing it as [A + ebx*4 + esi] makes sense if you had a uint8_t [2][4] matrix, so the stride from an element to the next row down is 4.

For dword element (like in your first example) instead of byte elements (like your 2nd), you'd need to scale your indices or by 4 already (like A[ebx*4] or make them byte offsets by using add esi, 4 instead of inc esi.