Search code examples
3dxnadrawingmonogamespritebatch

XNA/Monogame, Fastest way to draw multiple sheared/skewed sprites


I normally work with SpriteBatch in XNA/Monogame for 2d games and have just recently delved into 3D drawing methods such as DrawUserIndexedPrimatives and the like. I'm working on a project where our animators would like to have the ability to shear sprites and textures.

With SpriteBatch you can pass in a matrix on SpriteBatch begin to shear an object. Something like:

//translate object to origin
Matrix translate1 = Matrix.CreateTranslation(-rectangle.X, -rectangle.Y, 0);

//skew the sprite 33 degrees on the X and Y axis
Matrix skew = Matrix.Identity;
skew.M12 = (float)Math.Tan(33 * 0.0174532925f);
skew.M21 = (float)Math.Tan(33 * 0.0174532925f);

//translate object back
Matrix translate2 = Matrix.CreateTranslation(rectangle.X, rectangle.Y, 0);
Matrix transform = translate1 * skew * translate2;

_spriteBatch.Begin(SpriteSortMode.Deferred, BlendState.NonPremultiplied,
                    SamplerState.PointWrap, DepthStencilState.Default,
                    RasterizerState.CullCounterClockwise, null, transform);
_spriteBatch.Draw(_texture, rectangle, Color.White);
_spriteBatch.End();

The obvious down side of the this is that it requires you make a new SpriteBatch begin and end call for every sheared sprite. We currently only need 2 calls to SpriteBatch begin in our game. One for UI and one for World stuff. Our artist would want to use shear for stuff like swaying trees or animating legs and limbs on creatures so I could see that number jumping to 10+ separate batches if we gave them the option.

An average level has around 250 elements each containing 10-20 sprites.

I've written a test for Android that calls draw on 1000 sprites. Without any skewing it can draw all 1000, 600 times in about 11 seconds or approximately 53fps. But if I skew every tenth sprite (adding 100 new SpriteBatch call) it takes 47 seconds, or approximately 12fps.

That's really bad. Even for just 200 sprites (every tenth one skewed) the test drops to 28fps.

So I've also created the same test using Quads drawn with DrawUserIndexedPrimitives. Each Quad uses a shared BasicEffect created in the Game class and passed in via the Sprite classes constructor. I set the World Matrix and Texture before each pass.Apply() like so:

if (_basicEffect != null)
{
     foreach (EffectPass pass in _basicEffect.CurrentTechnique.Passes)
     {
        _basicEffect.World = Transform;
        _basicEffect.Texture = _texture;
        pass.Apply();

        GraphicsDevice.DrawUserIndexedPrimitives
            <VertexPositionNormalTexture>(
            PrimitiveType.TriangleList,
            _quad.Vertices, 0, 4,
            _quad.Indices, 0, 2);
}

For 1000 sprites, no skew, this gives me 12fps (I imagine it's like making 1000 spriteBatch calls). That's really bad. But for only 200 sprites with every 10th sprite skewed, I get 46fps which is significantly better than SpriteBatch (Even though I'm calling DrawUserIndexedPrimitives like 200 times).

---MY QUESTION---

How could I batch my calls to DrawUserIndexedPrimitives (or something similar) while keeping my sprites each contained in their own class that inherits DrawableGameComponent? That last parts pretty important just due to the nature of our game engine and the way it handles animation and collision and stuff.

I've read what I can about Vertex Buffers and DrawIndexedPrimitives, but don't quite have my head wrapped around it, and don't know how I'd assign new textures and world transforms to sprites drawn in this way.

Should I expect similar/better performance than SpriteBatch if I batch these calls?


Solution

  • It seems to me you have a couple of options, here. Note that I'm primarily familiar with XNA 4.0 on the PC, so not all of these may be possible/performant in your case.

    The Easy, Hacky Way

    You don't appear to be using the color channel when drawing your sprites; this technique assumes that your example is representative of your real code.

    If you don't need the sprite color for tinting your sprites, you can hijack it as a way to pass per-sprite data into a custom vertex/pixel shader. For example, you could do this:

    var shearX = MathHelper.ToRadians(33) / MathHelper.TwoPi;
    var shearY = MathHelper.ToRadians(33) / MathHelper.TwoPi;
    var color = new Color(shearX, shearY, 0f, 0f);
    _spriteBatch.Draw(_texture, rectangle, color);
    

    This represents the x- and y-shear values as factors of 2 * pi stored in the red and green color channels, respectively.

    Then you can create a custom vertex shader that retrieves these values and performs the shearing calculations on the fly. See Shawn Hargreaves's article here for information on how to do that.

    Hybrid Approach

    Another relatively straightforward possibility is to combine traditional sprite batching with your DrawUserIndexedPrimitives code.

    The key to good performance is to minimize state changes, so careful ordering of your sprites can go a long way. Organize your sprites such that you can draw all non-skewed sprites in a single pass using SpriteBatch, then only use the slower DrawUserIndexedPrimitives technique to draw the sprites that actually need it. This should significantly reduce the number of batches being sent to the GPU, assuming that most of the sprites in a given frame aren't skewed.

    Batching + Custom Vertex Format

    This is probably the best technique, but it also involves writing the most code. Not that any of it is particularly complex.

    The way SpriteBatch works internally is that it maintains a dynamic vertex buffer which is populated on the CPU and then drawn all in a single call. Shawn Hargreaves provides a high-level overview of how this sort of thing is done here.

    The problem with extending your DrawUserIndexedPrimitives to use this technique is that pesky world matrix; shaders don't really have a good way to attach a particular world matrix to a particular sprite (unless you're using hardware instancing, which I don't think your platform supports). So what can you do?

    If you create a custom vertex format, you can attach shearing values to each vertex, and use those to perform the shearing in the vertex shader, as in the first technique. This will allow you to draw all of your game's sprites in a single call, which should be very fast.

    You can find information on custom vertex declarations here.