Optimizing a simple 2D Tile engine (+potential bugfix)

Preface

Yes, there is plenty to cover here... but I'll do my best to keep this as well-organized, informative and straight-to-the-point as I possibly can!

Using the HGE library in C++, I have created a simple tile engine.
And thus far, I have implemented the following designs:

A CTile class, representing a single tile within a CTileLayer, containing row/column information as well as an HGE::hgeQuad (which stores vertex, color and texture information, see here for details).
A CTileLayer class, representing a two-dimensional 'plane' of tiles (which are stored as a one-dimensional array of CTile objects), containing the # of rows/columns, X/Y world-coordinate information, tile pixel width/height information, and the layer's overall width/height in pixels.

A CTileLayer is responsible for rendering any tiles which are either fully or partially visible within the boundaries of a virtual camera 'viewport', and to avoid doing so for any tiles which are outside of this visible range. Upon creation, it pre-calculates all information to be stored within each CTile object, so the core of engine has more room to breathe and can focus strictly on the render loop. Of course, it also handles proper deallocation of each contained tile.

Issues

The problem I am now facing essentially boils down to the following architectural/optimization issues:

In my render loop, even though I am not rendering any tiles which are outside of visible range, I am still looping through all of the tiles, which seems to have a major performance impact for larger tilemaps (i.e., any thing above 100x100 rows/columns @ 64x64 tile dimensions still drops the framerate down by 50% or more).
Eventually, I intend to create a fancy tilemap editor to coincide with this engine.
However, since I am storing all two-dimensional information inside one or more 1D arrays, I don't have any idea how possible it would be to implement some sort of rectangular-select & copy/paste feature, without some MAJOR performance hit -- involving looping through every tile twice per frame. And yet if I used 2D arrays, there would be a slightly less but more universal FPS drop!

Bug

As stated before... In my render code for a CTileLayer object, I have optimized which tiles are to be drawn based upon whether or not they are within viewing range. This works great, and for larger maps I noticed only a 3-8 FPS drop (compared to a 100+ FPS drop without this optimization).

But I think I'm calculating this range incorrectly, because after scrolling halfway through the map you can start to see a gap (on the topmost & leftmost sides) where tiles aren't being rendered, as if the clipping range is increasing faster than the camera can move (even though they both move at the same speed).

This gap gradually increases in size the further along into the X & Y axis you go, eventually eating up nearly half of the top & left sides of the screen on a large map. My render code for this is shown below...

Code

//
// [Allocate]
// For pre-calculating tile information
// - Rows/Columns   = Map Dimensions (in tiles)
// - Width/Height   = Tile Dimensions (in pixels)
//
void CTileLayer::Allocate(UINT numColumns, UINT numRows, float tileWidth, float tileHeight)
{
    m_nColumns = numColumns;
    m_nRows = numRows;

    float x, y;
    UINT column = 0, row = 0;
    const ULONG nTiles = m_nColumns * m_nRows;
    hgeQuad quad;

    m_tileWidth = tileWidth;
    m_tileHeight = tileHeight;
    m_layerWidth = m_tileWidth * m_nColumns;
    m_layerHeight = m_tileHeight * m_nRows;

    if(m_tiles != NULL) Free();
    m_tiles = new CTile[nTiles];

    for(ULONG l = 0; l < nTiles; l++)
    {
        m_tiles[l] = CTile();
        m_tiles[l].column = column;
        m_tiles[l].row = row;
        x = (float(column) * m_tileWidth) + m_offsetX;
        y = (float(row) * m_tileHeight) + m_offsetY;

        quad.blend = BLEND_ALPHAADD | BLEND_COLORMUL | BLEND_ZWRITE;
        quad.tex = HTEXTURE(nullptr); //Replaced for the sake of brevity (in the engine's code, I used a globally allocated texture array and did some random tile generation here)

        for(UINT i = 0; i < 4; i++)
        {
            quad.v[i].z = 0.5f;
            quad.v[i].col = 0xFF7F7F7F;
        }
        quad.v[0].x = x;
        quad.v[0].y = y;
        quad.v[0].tx = 0;
        quad.v[0].ty = 0;
        quad.v[1].x = x + m_tileWidth;
        quad.v[1].y = y;
        quad.v[1].tx = 1.0;
        quad.v[1].ty = 0;
        quad.v[2].x = x + m_tileWidth;
        quad.v[2].y = y + m_tileHeight;
        quad.v[2].tx = 1.0;
        quad.v[2].ty = 1.0;
        quad.v[3].x = x;
        quad.v[3].y = y + m_tileHeight;
        quad.v[3].tx = 0;
        quad.v[3].ty = 1.0;
        memcpy(&m_tiles[l].quad, &quad, sizeof(hgeQuad));

        if(++column > m_nColumns - 1) {
            column = 0;
            row++;
        }
    }
}

//
// [Render]
// For drawing the entire tile layer
// - X/Y          = world position
// - Top/Left     = screen 'clipping' position
// - Width/Height = screen 'clipping' dimensions
//
bool CTileLayer::Render(HGE* hge, float cameraX, float cameraY, float cameraTop, float cameraLeft, float cameraWidth, float cameraHeight)
{
    // Calculate the current number of tiles
    const ULONG nTiles = m_nColumns * m_nRows;

    // Calculate min & max X/Y world pixel coordinates
    const float scalarX = cameraX / m_layerWidth;  // This is how far (from 0 to 1, in world coordinates) along the X-axis we are within the layer
    const float scalarY = cameraY / m_layerHeight; // This is how far (from 0 to 1, in world coordinates) along the Y-axis we are within the layer
    const float minX = cameraTop + (scalarX * float(m_nColumns) - m_tileWidth); // Leftmost pixel coordinate within the world
    const float minY = cameraLeft + (scalarY * float(m_nRows) - m_tileHeight);  // Topmost pixel coordinate within the world
    const float maxX = minX + cameraWidth + m_tileWidth;                        // Rightmost pixel coordinate within the world
    const float maxY = minY + cameraHeight + m_tileHeight;                      // Bottommost pixel coordinate within the world

    // Loop through all tiles in the map
    for(ULONG l = 0; l < nTiles; l++)
    {
        CTile tile = m_tiles[l];
        // Calculate this tile's X/Y world pixel coordinates
        float tileX = (float(tile.column) * m_tileWidth) - cameraX;
        float tileY = (float(tile.row) * m_tileHeight) - cameraY;

        // Check if this tile is within the boundaries of the current camera view
        if(tileX > minX && tileY > minY && tileX < maxX && tileY < maxY) {
            // It is, so draw it!
            hge->Gfx_RenderQuad(&tile.quad, -cameraX, -cameraY);
        }
    }

    return false;
}

//
// [Free]
// Gee, I wonder what this does? lol...
//
void CTileLayer::Free()
{
    delete [] m_tiles;
    m_tiles = NULL;
}

Questions

What can be done to fix those architectural/optimization issues, without greatly impacting any other rendering optimizations?
Why is that bug occurring? How can it be fixed?

Thank you for your time!

Solution

Optimising the iterating of the map is fairly straight forward.

Given a visible rect in world coordinates (left, top, right, bottom) it's fairly trivial to work out the tile positions, simply by dividing by the tile size.

Once you have those tile coordinates (tl, tt, tr, tb) you can very easily calculate the first visible tile in your 1D array. (The way you calculate any tile index from a 2D coordinate is (y*width)+x - remember to make sure the input coordinate is valid first though.) You then just have a double for loop to iterate the visible tiles:

int visiblewidth = tr - tl + 1;
int visibleheight = tb - tt + 1;

for( int rowidx = ( tt * layerwidth ) + tl; visibleheight--; rowidx += layerwidth )
{
    for( int tileidx = rowidx, cx = visiblewidth; cx--; tileidx++ )
    {
        // render m_Tiles[ tileidx ]...
    }
}

You can use a similar system for selecting a block of tiles. Just store the selection coordinates and calculate the actual tiles in exactly the same way.

As for your bug, why do you have x, y, left, right, width, height for the camera? Just store camera position (x,y) and calculate the visible rect from the dimensions of your screen/viewport along with any zoom factor you have defined.