Search code examples
cbuffer-overflow

Overflow when encoding a structure into bytes into a buffer


I wrote this function that should write a structure of data into a buffer of bytes then returns how many have been written. The function does the following:

  1. ask as input a buffer of bytes (char *), his size and offset where to start writing. for example offset 0x00 will start at the beginning of the buffer. then ask the data to encode in this case my structure.
  2. collect common data which here is called Unit_t into another object called UnitSet that collects into a dynamic array all unique units.
  3. increase the size of the buffer if needed, in my test case I give a buffer of 1 byte so this case will happen.
  4. then it writes the main body of this data, which are the tiles, then the body of the units.
  5. The final output will look like this = [array of tiles] + [array of units]

The Problem At the last iteration, where units are encoded after the tiles, I get a buffer overflow error for *bytes and after checking with the debugger and asan I get the error after the fifth unit is encoded.

I think I messed up the calculation of the offsets to write the data because the output buffer wrote into a file is full or garbage, not the data I want, but I can't tell what's wrong because those conversions looks right to me.

What I expect is to jump to the byte I need to wrote, then convert into the data I want to write and then I write using the data size.

More Info

  • The TileSet_s structure isn't the one overflowing, I have unit tests for that structure.
  • I got the same error first on for (TileSet_Size_t ti = 0; ti < tileset->count; ++ti) but I was using an incorrect casting and offset calculation, after fixing it the error moved on the highlighted lines. This I why I think probably all offsets calculations are wrong.

If the problem isn't the byte offsets nor conversions and is more complex than I thought I will reproduce an example and share it for debugging it.

#define TILE_ENCODED_SIZE       ((sizeof(uint16_t) * 10) + 1)

typedef double Unit_t;

size_t
    TileSet_encode (struct TileSet_s *tileset,
        bytes_t **bytes, size_t bytes_offset, size_t bytes_size)
{
    // next we collect the vertices, duplicated aren't stored so we need to collect
    // them and then calculate.
    struct UnitSet_s set;

    if (!UnitSet_init(&set))
    {
        return 0;
    }

    for (TileSet_Size_t ti = 0; ti < tileset->count; ++ti)
    {       // translation of this stuff:
            // iterate each tile and iterate each vertex of each tile.
        for (unsigned short vi = 0; vi < TILE_VERTICES_MAX; ++vi)
        {       // dump the vertex into the set, if we fail terminate operation.
            if (UnitSet_add(&set, tileset->tilearray[ti]->tiledata.vertices[vi]) == UNIT_SET_NOMEM)
            {
                UnitSet_destroy(&set);
                return 0;
            }
        }
    }

    size_t size_tiles = tileset->count * TILE_ENCODED_SIZE;
    size_t size_vertices = sizeof(Unit_t) * set.list.length;
    size_t bytes_to_write = size_tiles + size_vertices;

    if ((bytes_size - bytes_offset) < bytes_to_write)
    {       // now we know how much space the whole thing takes.
            // ensure to increase space if needed.

        bytes_size = (bytes_size - bytes_offset) + bytes_to_write;
        bytes_t *newbytes = realloc(*bytes, sizeof(**bytes) * (bytes_offset + bytes_size));

        if (!newbytes)
        {       // failed to increase bytes buffer.
            return 0;
        }

        *bytes = newbytes;
    }

    for (TileSet_Size_t ti = 0; ti < tileset->count; ++ti)
    {       // encodes tiles.
        size_t tile_offset = bytes_offset + (ti * TILE_ENCODED_SIZE);

        struct Tile_s *tile = &tileset->tilearray[ti]->tiledata;
        *((TileSet_Id_t *)
            &((*bytes)[tile_offset])) = tileset->tilearray[ti]->id;

        for (size_t vi = 0; vi < TILE_VERTICES_MAX; ++vi)
        {
            ((uint16_t *)
                &((*bytes)[tile_offset + 1]))[sizeof(uint16_t) * vi]
                = (uint16_t) UnitSet_search(&set.list, tile->vertices[vi], 0, set.list.length - 1);
        }
    }

    for (size_t vi = 0; vi < set.list.length; ++vi)
    {       // encode vertices.

        // **********************************************************
        // ERROR HERE! Buffer Overflow at *bytes after the fifth unit !!

        ((Unit_t *)
            &((*bytes)[bytes_offset + size_tiles]))[sizeof(Unit_t) * vi]
            = set.list.array[vi];
        
        // **********************************************************
    }

    UnitSet_destroy(&set);

    return bytes_to_write;
}

I tried debugging it with asan and change the conversions.


Solution

  • Good news! Fixed the problem, thanks for the answers. here the function now:

    size_t
        TileSet_encode (struct TileSet_s *tileset,
            bytes_t **bytes, size_t bytes_offset, size_t bytes_size)
    {
        // next we collect the vertices, duplicated aren't stored so we need to collect
        // them and then calculate.
        struct UnitSet_s set;
    
        if (!UnitSet_init(&set))
        {
            return 0;
        }
    
        for (TileSet_Size_t ti = 0; ti < tileset->count; ++ti)
        {       // translation of this stuff:
                // iterate each tile and iterate each vertex of each tile.
            for (unsigned short vi = 0; vi < TILE_VERTICES_MAX; ++vi)
            {       // dump the vertex into the set, if we fail terminate operation.
                if (UnitSet_add(&set, tileset->tilearray[ti]->tiledata.vertices[vi]) == UNIT_SET_NOMEM)
                {
                    UnitSet_destroy(&set);
                    return 0;
                }
            }
        }
    
        // size is ok. :P
        size_t size_tiles = tileset->count * TILE_ENCODED_SIZE;
        size_t size_vertices = sizeof(double) * set.list.length;
        size_t bytes_to_write = 8 + size_tiles + size_vertices;
    
        if ((bytes_size - bytes_offset) < bytes_to_write)
        {       // now we know how much space the whole thing takes.
                // ensure to increase space if needed.
    
            bytes_size = bytes_offset + ((bytes_size - bytes_offset) + bytes_to_write);
            bytes_t *newbytes = realloc(*bytes, sizeof(**bytes) * bytes_size);
    
            if (!newbytes)
            {       // failed to increase bytes buffer.
                return 0;
            }
    
            *bytes = newbytes;
        }
    
        // ******************************************************************
        // improvements from here.
    
        uint16_t chunk_x = 20;
        uint16_t chunk_y = 40;
    
        memset((*bytes) + bytes_offset, 0xAA, bytes_size - bytes_offset);   // TEST, remove garbage.
    
        memcpy((*bytes) + bytes_offset, &chunk_x, sizeof(uint16_t));
        memcpy((*bytes) + bytes_offset + 0x02, &chunk_y, sizeof(uint16_t));
        memcpy((*bytes) + bytes_offset + 0x04, &tileset->count, sizeof(uint16_t));
    
        for (TileSet_Size_t tnum = 0; tnum < tileset->count; ++tnum)
        {
            size_t bytes_tile_offset = bytes_offset + 0x08 + (tnum * TILE_ENCODED_SIZE);
            struct TileSet_Tile_s *tiledata = tileset->tilearray[tnum];
    
            uint8_t tile_id = (uint8_t) tiledata->id;
    
            memcpy((*bytes) + bytes_tile_offset, &tile_id, sizeof(tile_id));
    
            for (size_t vi = 0; vi < TILE_VERTICES_MAX; ++vi)
            {
                uint16_t tile_vidx = (uint16_t) vi;
    
                memcpy((*bytes) + bytes_tile_offset + 1 + (vi * sizeof(uint16_t)),
                    &tile_vidx, sizeof(tile_vidx));
            }
        }
    
        for (size_t vi = 0; vi < set.list.length; ++vi)
        {
            double unit = (double) set.list.array[vi];
    
            memcpy((*bytes) + bytes_offset + 0x08 + size_tiles + (sizeof(unit) * vi),
                &unit, sizeof(unit));
        }
    
        // ******************************************************************
    
        UnitSet_destroy(&set);
    
        return bytes_to_write;
    }
    

    In Short

    • Using casting with local variables to ensure to create the bytes. maybe too much sizeof().
    • I've used the memory methods to write the bytes.

    The result buffer is what I want. The function needs some extra tweaks and over-all it works.