Why logic shift left & or certain values in assembly

I'm currently learning assembly in college and have recently started writing assembly programs to light up LED's on a 32x32 LED simulator.

We had a lab this week and the first question was "create a program that lights up a random individual LED and continues until all LED's are lit". I had a friend show me how they did it but I'm still confused as to how some of the options work. Heres the code:

.data
x           DWORD   0
y           DWORD   0
row         DWORD   0
row_copy    DWORD   00000001h

.code
    main:nop

        invoke version
        invoke setPattern, 0
    row_random:
        invoke random, 32 ;create a random number between 0-31
        mov x, eax        ;move that value into memory location x
        invoke readRow, x ;select a row to be altered 
        mov row, eax 
    row_on:
        invoke random, 32
        mov ecx, eax      ;move the random value into ecx
        shl row_copy, CL  ;shift left with carry flag (This is where Im confused)
        mov eax, row
        mov ebx, row_copy
        or eax, ebx ; I'm also unsure as to why this is happening 
        invoke writeRow, x, eax  ;alter a pixel at the random row x with the value of eax
        mov row_copy, 00000001h
        ;invoke Sleep, 1
        jmp row_random
        invoke ExitProcess,0

Originally when I did it I was creating a random number between 0-31 setting it in EBX and using writeRow with x and ebx. However that was wrong. Could someone explain to me why you logically shift left with the CL? and why its necessary to or the two values? I think the or is there to make sure that you dont accidentally switch an LED off if it was already on?

Solution

CL is the low byte of ecx. You're confusing it with CF, the carry flag in EFLAGS. x86 variable-count shift instructions require the shift-count to be in cl.

And just for the record, that code is hilariously inefficient. row_copy is shifted with a memory-destination instruction (slow), then loaded, then replaced with 1 again! So... you could have done

mov    ecx, eax
mov    ebx, 1
shl    ebx, cl

like a normal person. There's no reason to have a memory location for row_copy at all, just do it in a register. You only need static memory storage for stuff when you run out of registers.

The basic logic the code implements is row |= (1 << rand_0_31) to set a random bit (which might already be set).

If you want to see how this code operates, single-step it in a debugger and watch values in registers change. See also the x86 tag wiki for guides, docs, and debugging tips.

BTW, an even more efficient way to create a mask with 1 bit set is xor ebx,ebx / bts ebx, eax to avoid needing the shift count in ECX, but if you haven't learned about BTS yet, it doesn't do anything you can't do with other simpler instructions.

And actually, BTS would mean you don't need a separate mask and OR instruction at all, just get the old value of the row in one register, a random number in the other register, and bts ebx, eax to set the EAX'th bit in EBX.

Assuming that your function-calling convention only clobbers ECX and EDX (plus EAX with the return value), you don't need any static storage locations for this, just registers. I'd do something like:

; untested
.code
    main:
        push  ebx   ; save a couple call-preserved registers
        push  edi   ; for values that survive across function calls

        ; nop       ; what's the point of this NOP?
        invoke version
        invoke setPattern, 0

    row_random:
        invoke random, 32 ;create a random number between 0-31
        mov    ebx, eax         ; eax = ebx = row
        invoke readRow, eax
        mov    edi, eax         ; edi = old value of row
        invoke random, 32

        mov    ecx, eax         ; ecx = random column = bit position
        mov    eax, 1
        shl    eax, cl          ; 1 << random
        or     edi, eax         ; row_value |= 1<<random

        invoke writeRow, ebx, edi  ; pixel[ebx] |= 1<<random

        jmp row_random
        ; or loop a finite number of times with dec / jnz.

        pop  edi
        pop  ebx
        return
        ;  invoke ExitProcess,0

The entire middle block (with the shl and or) could be bts edi, eax.

invoke is a macro that probably pushes and cleans up the stack after the call, so you could be even more efficient by using mov stores to the stack and leaving the space there. Also, if you're on a new enough CPU, you could use rdrand ebx for fun.

Fun fact: shift instructions mask the count, so they always shift by 0-31 regardless of what input you use, so you wouldn't need and ecx, 31 after RDRAND ECX for the bit position.

Also, you could invoke random 32*32 and split up the result into row bits and column bits.