Fast block placement algorithm, advice needed?

I need to emulate the window placement strategy of the Fluxbox window manager.

As a rough guide, visualize randomly sized windows filling up the screen one at a time, where the rough size of each results in an average of 80 windows on screen without any window overlapping another.

If you have Fluxbox and Xterm installed on your system, you can try the xwinmidiarptoy BASH script to see a rough prototype of what I want happening. See the xwinmidiarptoy.txt notes I've written about it explaining what it does and how it should be used.

It is important to note that windows will close and the space that closed windows previously occupied becomes available once more for the placement of new windows.

The algorithm needs to be an Online Algorithm processing data "piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start."

The Fluxbox window placement strategy has three binary options which I want to emulate:

Windows build horizontal rows or vertical columns (potentially)
Windows are placed from left to right or right to left
Windows are placed from top to bottom or bottom to top

Differences between the target algorithm and a window-placement algorithm

The coordinate units are not pixels. The grid within which blocks will be placed will be 128 x 128 units. Furthermore, the placement area may be further shrunk by a boundary area placed within the grid.

Why is the algorithm a problem?

It needs to operate to the deadlines of a real time thread in an audio application.

At this moment I am only concerned with getting a fast algorithm, don't concern yourself over the implications of real time threads and all the hurdles in programming that that brings.

And although the algorithm will never ever place a window which overlaps another, the user will be able to place and move certain types of blocks, overlapping windows will exist. The data structure used for storing the windows and/or free space, needs to be able to handle this overlap.

So far I have two choices which I have built loose prototypes for:

1) A port of the Fluxbox placement algorithm into my code.

The problem with this is, the client (my program) gets kicked out of the audio server (JACK) when I try placing the worst case scenario of 256 blocks using the algorithm. This algorithm performs over 14000 full (linear) scans of the list of blocks already placed when placing the 256th window.

For a demonstration of this I created a program called text_boxer-0.0.2.tar.bz2 which takes a text file as input and arranges it within ASCII boxes. Issue make to build it. A little unfriendly, use --help (or any other invalid option) for a list of command line options. You must specify the text file by using the option.

2) My alternative approach.

Only partially implemented, this approach uses a data structure for each area of rectangular free unused space (the list of windows can be entirely separate, and is not required for testing of this algorithm). The data structure acts as a node in a doubly linked list (with sorted insertion), as well as containing the coordinates of the top-left corner, and the width and height.

Furthermore, each block data structure also contains four links which connect to each immediately adjacent (touching) block on each of the four sides.

IMPORTANT RULE: Each block may only touch with one block per side. This is a rule specific to the algorithm's way of storing free unused space and bears no factor in how many actual windows may touch each other.

The problem with this approach is, it's very complex. I have implemented the straightforward cases where 1) space is removed from one corner of a block, 2) splitting neighbouring blocks so that the IMPORTANT RULE is adhered to.

The less straightforward case, where the space to be removed can only be found within a column or row of boxes, is only partially implemented - if one of the blocks to be removed is an exact fit for width (ie column) or height (ie row) then problems occur. And don't even mention the fact this only checks columns one box wide, and rows one box tall.

I've implemented this algorithm in C - the language I am using for this project (I've not used C++ for a few years and am uncomfortable using it after having focused all my attention to C development, it's a hobby). The implementation is 700+ lines of code (including plenty of blank lines, brace lines, comments etc). The implementation only works for the horizontal-rows + left-right + top-bottom placement strategy.

So I've either got to add some way of making this +700 lines of code work for the other 7 placement strategy options, or I'm going to have to duplicate those +700 lines of code for the other seven options. Neither of these is attractive, the first, because the existing code is complex enough, the second, because of bloat.

The algorithm is not even at a stage where I can use it in the real time worst case scenario, because of missing functionality, so I still don't know if it actually performs better or worse than the first approach.

The current state of C implementation of this algorithm is freespace.c. I use gcc -O0 -ggdb freespace.c to build, and run it in an xterm sized to atleast 124 x 60 chars.

What else is there?

I've skimmed over and discounted:

Bin Packing algorithms: their emphasis on optimal fit does not match the requirements of this algorithm.
Recursive Bisection Placement algorithms: sounds promising, but these are for circuit design. Their emphasis is optimal wire length.

Both of these, especially the latter, all elements to be placed/packs are known before the algorithm begins.

What are your thoughts on this? How would you approach it? What other algorithms should I look at? Or even what concepts should I research seeing as I've not studied computer science/software engineering?

Please ask questions in comments if further information is needed.

Further ideas developed since asking this question

Some combination of my "alternative algorithm" with a spatial hashmap for identifying if a large window to be placed would cover several blocks of free space.

Solution

#include <limits.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>


#define FSWIDTH 128
#define FSHEIGHT 128


#ifdef USE_64BIT_ARRAY
    #define FSBUFBITS 64
    #define FSBUFWIDTH 2
    typedef uint64_t fsbuf_type;
    #define TRAILING_ZEROS( v ) __builtin_ctzl(( v ))
    #define LEADING_ONES( v )   __builtin_clzl(~( v ))
#else
#ifdef USE_32BIT_ARRAY
    #define FSBUFBITS 32
    #define FSBUFWIDTH 4
    typedef uint32_t fsbuf_type;
    #define TRAILING_ZEROS( v ) __builtin_ctz(( v ))
    #define LEADING_ONES( v )   __builtin_clz(~( v ))
#else
#ifdef USE_16BIT_ARRAY
    #define FSBUFBITS 16
    #define FSBUFWIDTH 8
    typedef uint16_t fsbuf_type;
    #define TRAILING_ZEROS( v ) __builtin_ctz( 0xffff0000 | ( v ))
    #define LEADING_ONES( v )   __builtin_clz(~( v ) << 16)
#else
#ifdef USE_8BIT_ARRAY
    #define FSBUFBITS 8
    #define FSBUFWIDTH 16
    typedef uint8_t fsbuf_type;
    #define TRAILING_ZEROS( v ) __builtin_ctz( 0xffffff00 | ( v ))
    #define LEADING_ONES( v )   __builtin_clz(~( v ) << 24)
#else
    #define FSBUFBITS 1
    #define FSBUFWIDTH 128
    typedef unsigned char fsbuf_type;
    #define TRAILING_ZEROS( v ) (( v ) ? 0 : 1)
    #define LEADING_ONES( v )   (( v ) ? 1 : 0)
#endif
#endif
#endif
#endif


static const fsbuf_type fsbuf_max =   ~(fsbuf_type)0;
static const fsbuf_type fsbuf_high =  (fsbuf_type)1 << (FSBUFBITS - 1);


typedef struct freespacegrid
{
    fsbuf_type buf[FSHEIGHT][FSBUFWIDTH];

    _Bool left_to_right;
    _Bool top_to_bottom;

} freespace;


void freespace_dump(freespace* fs)
{
    int x, y;

    for (y = 0; y < FSHEIGHT; ++y)
    {
        for (x = 0; x < FSBUFWIDTH; ++x)
        {
            fsbuf_type i = FSBUFBITS;
            fsbuf_type b = fs->buf[y][x];

            for(; i != 0; --i, b <<= 1)
                putchar(b & fsbuf_high ? '#' : '/');
/*
            if (x + 1 < FSBUFWIDTH)
                putchar('|');
*/
        }
        putchar('\n');
    }
}


freespace* freespace_new(void)
{
    freespace* fs = malloc(sizeof(*fs));

    if (!fs)
        return 0;

    int y;

    for (y = 0; y < FSHEIGHT; ++y)
    {
        memset(&fs->buf[y][0], 0, sizeof(fsbuf_type) * FSBUFWIDTH);
    }

    fs->left_to_right = true;
    fs->top_to_bottom = true;

    return fs;
}


void freespace_delete(freespace* fs)
{
    if (!fs)
        return;

    free(fs);
}

/* would be private function: */
void fs_set_buffer( fsbuf_type buf[FSHEIGHT][FSBUFWIDTH],
                    unsigned x,
                    unsigned y1,
                    unsigned xoffset,
                    unsigned width,
                    unsigned height)
{
    fsbuf_type v;
    unsigned y;

    for (; width > 0 && x < FSBUFWIDTH; ++x)
    {
        if (width < xoffset)
            v = (((fsbuf_type)1 << width) - 1) << (xoffset - width);
        else if (xoffset < FSBUFBITS)
            v = ((fsbuf_type)1 << xoffset) - 1;
        else
            v = fsbuf_max;

        for (y = y1; y < y1 + height; ++y)
        {
#ifdef FREESPACE_DEBUG
            if (buf[y][x] & v)
                printf("**** over-writing area ****\n");
#endif
            buf[y][x] |= v;
        }

        if (width < xoffset)
            return;

        width -= xoffset;
        xoffset = FSBUFBITS;
    }
}


_Bool freespace_remove(   freespace* fs,
                          unsigned width, unsigned height,
                          int* resultx,   int* resulty)
{
    unsigned x, x1, y;
    unsigned w, h;
    unsigned xoffset, x1offset;
    unsigned tz; /* trailing zeros */

    fsbuf_type* xptr;
    fsbuf_type mask =   0;
    fsbuf_type v;

    _Bool scanning = false;
    _Bool offset = false;

    *resultx = -1;
    *resulty = -1;

    for (y = 0; y < (unsigned) FSHEIGHT - height; ++y)
    {
        scanning = false;
        xptr = &fs->buf[y][0];

        for (x = 0; x < FSBUFWIDTH; ++x, ++xptr)
        {
            if(*xptr == fsbuf_max)
            {
                scanning = false;
                continue;
            }

            if (!scanning)
            {
                scanning = true;
                x1 = x;
                x1offset = xoffset = FSBUFBITS;
                w = width;
            }
retry:
            if (w < xoffset)
                mask = (((fsbuf_type)1 << w) - 1) << (xoffset - w);
            else if (xoffset < FSBUFBITS)
                mask = ((fsbuf_type)1 << xoffset) - 1;
            else
                mask = fsbuf_max;

            offset = false;

            for (h = 0; h < height; ++h)
            {
                v = fs->buf[y + h][x] & mask;

                if (v)
                {
                    tz = TRAILING_ZEROS(v);
                    offset = true;
                    break;
                }
            }

            if (offset)
            {
                if (tz)
                {
                    x1 = x;
                    w = width;
                    x1offset = xoffset = tz;
                    goto retry;
                }
                scanning = false;
            }
            else
            {
                if (w <= xoffset) /***** RESULT! *****/
                {
                    fs_set_buffer(fs->buf, x1, y, x1offset, width, height);
                    *resultx = x1 * FSBUFBITS + (FSBUFBITS - x1offset);
                    *resulty = y;
                    return true;
                }
                w -= xoffset;
                xoffset = FSBUFBITS;
            }
        }
    }
    return false;
}


int main(int argc, char** argv)
{
    int x[1999];
    int y[1999];
    int w[1999];
    int h[1999];

    int i;

    freespace* fs = freespace_new();

    for (i = 0; i < 1999; ++i, ++u)
    {
        w[i] = rand() % 18 + 4;
        h[i] = rand() % 18 + 4;

        freespace_remove(fs, w[i], h[i], &x[i], &y[i]);
/*
        freespace_dump(fs);
        printf("w:%d h:%d x:%d y:%d\n", w[i], h[i], x[i], y[i]);
        if (x[i] == -1)
            printf("not removed space %d\n", i);
        getchar();
*/
    }

    freespace_dump(fs);
    freespace_delete(fs);

    return 0;
}

The above code requires one of USE_64BIT_ARRAY, USE_32BIT_ARRAY, USE_16BIT_ARRAY, USE_8BIT_ARRAY to be defined otherwise it will fall back to using only the high bit of an unsigned char for storing the state of grid cells.

The function fs_set_buffer will not be declared in the header, and will become static within the implementation when this code gets split between .h and .c files. A more user friendly function hiding the implementation details will be provided for removing used space from the grid.

Overall, this implementation is faster without optimization than my previous answer with maximum optimization (using GCC on 64bit Gentoo, optimization options -O0 and -O3 respectively).

Regarding USE_NNBIT_ARRAY and the different bit sizes, I used two different methods of timing the code which make 1999 calls to freespace_remove.

Timing main() using the Unix time command (and disabling any output in the code) seemed to prove my expectations correct - that higher bit sizes are faster.

On the other hand, timing individual calls to freespace_remove (using gettimeofday) and comparing the maximum time taken over the 1999 calls seemed to indicate lower bit sizes were faster.

This has only been tested on a 64bit system (Intel Dual Core II).