Search code examples
assemblyx86-16vga

Do VGA cards read in the pixel buffer when the vertical retrace bit is cleared?


I'm working on a game for DOS which uses video mode 13h.

I've always had issues with screen tearing, but until today I've been ignoring the problem. I assumed it was going to be a challenge to fix since it would involve delaying pixel writes for some precise amount of time. But it was actually a really simple fix.

All you have to do is wait for the vertical retrace bit (bit 3) of the VGA status byte, available at port 0x3da in color mode, to be newly set.

So I just had to modify this old procedure, which writes my frame buffer to the VGA pixel buffer starting at A000:0000:

WRITE_FRAME PROC

;WRITES ALL 64,000 PIXELS (32,000 WORDS) IN THE FRAME BUFFER TO VIDEO MEMORY

    push es
    push di
    push ds
    push si
    push cx

    mov cx, frame
    mov ds, cx
    xor si, si             ;ds:si -> frame buffer (source)                  

    mov cx, vidMemSeg
    mov es, cx
    xor di, di             ;es:di -> video memory (destination)

    mov cx, (scrArea)/2    ;writing 32,000 words of pixels
    rep movsw              ;write the frame


    pop cx
    pop si
    pop ds
    pop di
    pop es
    ret

WRITE_FRAME ENDP

And here's the modified procedure that waits for the vertical retrace bit to be newly set:

WRITE_FRAME PROC

;WRITES ALL 64,000 PIXELS (32,000 WORDS) IN THE FRAME BUFFER TO VIDEO MEMORY

    push es
    push di
    push ds
    push si
    push ax
    push cx
    push dx

    mov cx, frame
    mov ds, cx
    xor si, si             ;ds:si -> frame buffer (source)                  

    mov cx, vidMemSeg
    mov es, cx
    xor di, di             ;es:di -> video memory (destination)

    mov cx, (scrArea)/2    ;writing 32,000 words of pixels

                           ;If vert. retrace bit is set, wait for it to clear
    mov dx, 3dah           ;dx <- VGA status register
VRET_SET:
    in al, dx              ;al <- status byte
    and al, 8              ;is bit 3 (vertical retrace bit) set
    jnz VRET_SET           ;If so, wait for it to clear

VRET_CLR:                  ;When it's cleared, wait for it to be set
    in al, dx
    and al, 8
    jz VRET_CLR            ;loop back till vert. retrace bit is newly set

    rep movsw              ;write the frame


    pop dx
    pop cx
    pop ax
    pop si
    pop ds
    pop di
    pop es
    ret

WRITE_FRAME ENDP 

It's not completely perfect. There's still a little jitter, especially when the background behind the sprite is scrolling up or down, but it doesn't hurt to look at anymore.

My question is, why does this work?

My guess is that when the vertical retrace bit is set, the pixels have already been read into the VGA card's memory, and it is currently in the process of writing it's already loaded pixels. However, when the vertical retrace bit is cleared, it is in the process of loading the pixels from A000:0000 into local memory. It uses DMA for this, right?

So, it's only safe to write to A000:0000 when the VGA card is writing pixels (bit set), and not loading pixels in (bit cleared)

Or am I totally wrong?


Solution

  • There is no separate buffer that a VGA card reads into. (Remember that when VGA was new, even 32kiB of DRAM was expensive. Also, memory bandwidth was low. Some video cards used to use dual-ported RAM so access from the CPU wouldn't disturb scan-out; it could be read/written on one port while the CRTC / RAMDAC was reading pixel data.)

    During a vertical-blanking interval, the video card isn't reading or writing video RAM at all; it exists so the CRT can change the voltage of the electron-beam deflection plates back to the top of the screen without drawing a line up the screen. Then the VGA hardware starts reading video RAM in order again for the next scan-out of the next frame.

    (Modern hardware of course doesn't drive a CRT, but reading VRAM in order with a "blanking interval" is still a thing).


    Waiting for the bit to be set then cleared helps make it likely that your code starts running at the start of the blanking interval, instead of maybe near the end of the blanking interval.

    If your code that modifies video RAM runs quickly enough, it's done before the hardware starts reading again, so you don't get tearing. (Actually, because you're writing the screen in scan-out order, it only needs to be fast enough to stay ahead of the raster scan, so the screen output doesn't pass the memcpy and display some "old" pixels later in the frame.)

    On old hardware, rep movsw wasn't fast enough to copy a whole frame of data during the VBI, especially not when writing to memory-mapped I/O over an ISA bus. Instead you'd typically double-buffer by changing the VGA base to point to an already-drawn frame during the VBI. So you draw in one buffer while the other being scanned out, giving you a whole frame interval to update it, instead of just the VBI.


    rep movsw runs very fast on actual modern CPUs (e.g. if you boot a modern PC in real mode). If VRAM is mapped as WC (aka USWC: uncacheable speculative write combining), then rep movsw will copy 16 or 32 bytes at a time (Fast Strings mode or even ERMSB (Enhanced Rep Mov/Stos B)), benefiting from write-combining buffers. (Regular stores on WC memory are like NT stores on normal WB (writeback) memory). Intel errata (like IvyBridge BU2) indicates that REP MOVS on WC memory really does work this way: if you cross a page from WC into UC memory, some stores to UC memory can happen with wide fast-strings stores instead of separate 16-bit stores for rep movsw. That means the CPU must be doing wide stores to WC memory.

    If the source data is hot in L1d or L2 cache because you just wrote it, and the destination is USWC video RAM, then blitting it with rep movsw should easily finish during the VBI. If it's mapped as UC (this used to be a BIOS option when WC was a relatively new feature, on Pentium III / early K8 boards at least), then a modern multi-GHz PC is probably still plenty fast.

    (BTW, repne cmpsb is still slow, but rep movs/stos is fast).

    BTW, even with integrated graphics where "video RAM" is still just part of your regular DRAM, it will be UC (uncacheable) or WC (un-cacheable write-combining)). Of course, most of the VGA interface is emulated these days. VGA memory might be the real frame buffer used by your graphics hardware, though (if running on bare metal, not DOSBOX or other emulator).

    Anyway, on modern hardware for low rez, you're probably fine to only check for the bit being cleared, as the copy runs so fast compared to the refresh rate that there's near-zero chance of getting any tearing. Or maybe the first pixel or two might come from the old frame.


    On DOSBOX simulating a real old PC with a realistic clock speed:

    @Ped7G says rep movsw wasn't fast enough to copy a frame during the VBI, unless you set DOSBOX to simulate a 486 at ~70MHz, or "dynamic / max" speed.