Is “Pixel Blitting” in AS3 really worth the effort?

While planning some new routines for the PixelBlitz engine tonight one thing struck me – is it actually worth it?

There are a number of articles across the web about pixel blitting in AS3 (most of them at 8-bit Rocket :) but I did wonder if anyone had actually done some tests to see just what difference it makes in real-world terms.

After all, why mess around “blitting” things about if using a Sprite or MovieClip is just as fast anyway? Infact you could easily argue that using a native Flash display object gives you far more control (as you get to play with scaling, alpha, rotation, animation, sound events and more, easily).

Another thing also struck me – when building up the display for render the AVM will automatically use a dirty rectangles system. If you’ve got two overlapping movieclips then it won’t waste time drawing pixels that would otherwise be obscured by the one in front. Traditional blitting on the other hand doesn’t care about this, it’ll gleefully copyPixel() until the cows come home, pasting image after image on-top of each other (PixelBlitz suffers from this issue too).

[ Side note: It’s true we could add a similar dirty rectangles system to Pixel Blitz, to avoid copying data when it’s guaranteed to be overwritten further up the chain – but this is not something we’ve found a fast way to do yet (the potential alpha channel of a bitmap causing the most problems), the overhead of sorting and checking for overlaps is always taking longer than just brute-force copying everything each time (if you can help, email me!) ]

Tonight I decided to write two simple tests. They would measure the speed of the AVMs dirty rectangle system vs. raw bitmapdata copypixel power. I was interested in 3 things – the overall time it took to run the test, the amount of memory it used and the average fps rate.

The Tests

I took a 550 x 400 sized stage published at 30 fps. All tests were run using the Debug version of the Player (9.0 r 124). The test consisted of creating an array of X number of sprites (to test the AVM) and PixelSprites (to test blitting). Each sprite was 50×50 in size and contained an alpha channel. I then drew all of the sprites onto the stage and moved them along by 4 pixels per frame, if they hit the left of the stage they wrapped around to the right again. The Sprites had cacheAsBitmap set to true (see note below)

Then I ran the tests multiple times, with varying numbers of fish, for varying durations, recording the data at each step and averaging it out.

I agree that this is in no way a truly “scientific” test, but I wanted a general “feeling” as a result, to see if this was an avenue still worth walking down or not.

The Results

With 500 sprites both the standard Sprite and the blit method kept a solid 30 fps frame rate. Using Sprites consumed 15MB of RAM, using blits 11MB.

At 1000 sprites we’re still at a consistent 30 fps, but there is noticeable “tearing” in the visuals as the sprites move across the stage. It’s not terrible, but you can easily see it. The standard method is now using 20MB while the blit is using 14MB.

2500 sprites and we see both techniques struggle to keep-up with the 30 fps rate. The traditional Sprites actually outpace the blitting at 23 fps vs 21 fps, but the memory consumption is more than doubled, 35MB vs. 15MB.

At 5000 sprites they are both starting to feel the strain, each level pegging at 12 fps. But the standard Sprites technique is using a staggering 58MB, while the blit is only up to 20MB.

7,500 sprites all moving at once and both techiques are virtually bought to their knees managing just 8 fps each. Given the amount of data moving this isn’t totally surprising. The blit technique at this point is literally copying 18.7 million pixels around in memory. The AVMs internal dirty rectangle is feeling the full force of what’s going on however, and is now consuming 237MB of RAM vs. the blit techniques 25MB.

10,000 sprites crashes the Debug player for both versions, it literally runs out of memory :)

cacheAsBitmap

As I mentioned at the start, the Sprite version had cacheAsBitmap set to true. This is the main cause of the huge amount of RAM being used. As our Sprite only contained a single Bitmap this wasn’t needed. By removing this setting the amount of RAM used dropped, ending up only a few MB higher than the straight blit method.

Our Findings

So what can we pull from this?

First of all, the AVM dirty rectangles implementation is pretty damn sweet! But brute-force blitting is equally as fast in this test case. Logic tells us that adding redraw aware optimisation to our blit engine should increase this gap in our favour significantly.

NEVER enable cacheAsBitmap on a Sprite or MovieClip if all it contains is bitmap data.

The blit engine uses less memory. If you need to cache vector Sprites in your game, then it uses considerably less memory!

No-one really needs a game with 7,500 fish swimming around in it 😉

Maybe the test wasn’t “real world” enough – even at the 1000 sprite level (at which both methods kept a 30 fps frame rate) we were still moving 2.5 million pixels around a 550 x 400 stage. That’s enough to fill the stage 11 times over (and still have some spare). Is this likely in a real game? Well no, I don’t believe so – but it isn’t that far off either. Games are getting bigger (we published one at 800×600 today for example), and if you had a game featuring multiple layers going on (foreground, player, background, distance, etc) with alpha showing through them all, then it doesn’t take long to start using pixels in the millions range.

There are instances when I believe it’s just easier to deal with things on a blit level – for example building up a large n-way scrolling tilemap, where you constantly need to redraw the scroll buffers. Doing the same by placing (and updating) hundreds of Sprites would be an exercise in pain I wouldn’t wish on anyone.

Is a combination of both worlds the way to go? Quite possibly. While I loathe using the timeline (or Movieclips in general) for anything, they do offer Flash animators a rich featured tool-set that let’s them create vibrant moving games. Whereas the blit method requires graphic artists trained in the way of the pixel, and I believe those are a dying (and expensive) breed indeed. Creating quality animations at that level is time-consuming and costly. But as we’ve seen, animating on a vector level introduces both resource and speed issues into your game.

What about collision detection? Well we all know this pretty much sucks in Flash. So we have to roll our own methods anyway. For pixel perfect collision detection we need to inspect the elements on a pixel level (surprise surprise), at least with the blit technique we’re already operating on that level, so there’s no extra draw() overhead involved.

Conclusion

Are AS3 Sprites “evil” for those of you trying to create arcade style games? No, I don’t believe so. They can hold their own in the speed stakes thanks to the power of the AVM, but you do have to watch yourself and be very careful re: memory consumption.

Is “blitting” really that much faster the using normal Sprites? No, it isn’t. It does have less memory overhead and a “cleaner” feel to it, but it’s no speed demon in comparison.

Would a hybrid solution work? (i.e. a fully blitted tilemap with Movieclips characters on-top) – yes, absolutely!

Don’t feel that because you have travelled down the “blit” route you need to have the whole game living there. If you can mix and match your game logic and most importantly your collision systems, then there’s no harm in splitting these elements up, using both at once.

P.S. If you’ve got some ideas or concepts on optimising blit level drawing, please get in touch. I’ve been reading a lot about this recently (what I can find at least) but it’s always good to pick someone’s brain.

Posted on September 19th 2008 at 1:12 am by .
View more posts in ActionScript3. Follow responses via the RSS 2.0 feed.


24 Responses

Leave a comment
  • September 19th 2008 at 10:17 am

    Have you tried comparing blitting to using MovieClips? I think that that would be more relevant perhaps because games frequently use animating MovieClips for moving entities. I’ve done a small test like that on my blog. Perhaps then blitting reaps a more significant benefit.

    Also, will you be releasing the code you used in your test?

  • September 19th 2008 at 1:55 pm

    It would have been interesting to see the tests running at 120fps, then any little slow down is noticeable.

    Did you double buffer the display on the bitmap version ?

    As Dark Vyper said above, mc’s are a better test. Without using mc’s you’re having to set up your own scrollRect animation system to achieve the same with sprites, which I don’t know if that’s going to be quicker ( Sprites + anim vs Movieclips ).

    Also 1 ( albeit ) tiny advantage blitting has over Flash native methods is that you won’t hit the alpha layers ceiling. At present I think it’s 12 ( Used to be 8 ), so as soon as you get 12 images with alpha blending in them on top of each other Flash starts dropping the alpha from them ( Usually a nasty black box is displayed instead ).

    For a game with a lot of sprites I’d still rather stick with blitting. It’s a bit more a ball ache to set up, but not painfully so, and there’s more room to speed things up than using the system native routines ( For an example that’s just come to me, it may well be possible using Pixel Blender to get that to clear the screen buffer quicker. So you’d have a triple buffered display, screen a) visible and all done, screen b) being plotted to and screen c) Hydra could be clearing on it’s own thread freeing up the cpu for the screen b) plotting ).

  • September 19th 2008 at 2:31 pm

    Hi guys – no I didn’t double buffer the display. I didn’t try a custom event renderer either. I’ll try a double buffer just to see if it makes any difference – it may well reduce the “tearing” effect you can see quite clearly at high sprite counts.

    I’m happy to test again with movieclips, and you’re right I think it will make a difference – but not everything animates, typically a few central characters (you, the baddies, maybe some interface / explosions) – lots of extra fluff (bullets, tiles, backdrops, etc) are quite static, so I think what is needed is a real full-on game test level, but of course that takes time to create! :) and of course depending on the type of game the test is more/less relevant anyway.

    That bit about the alpha levels is interesting, thanks!

    I think I need to see about getting a really decent dirty rects system running within PixelBlitz to optimise the volume of copypixel operations that are happening, and then check again.

  • September 22nd 2008 at 10:02 pm

    Rich, nice stuff. Were the fish running through an animation or just static? The beauty of the blit is to seamlessly animate over a series of frames. That can be done inside individual sprites or with a single blit canvas. If you you GotoAndStop to animate, you will max out at about 500 objects on the screen.

  • September 22nd 2008 at 10:03 pm

    And yes, I have done my own tests on the subject:
    http://www.8bitrocket.com/newsdisplay.aspx?newspage=7496 with some similar findings.

    By the way, How is the Pixel Blitz scrolling accomplished?

  • September 23rd 2008 at 4:25 am

    Rich, I don’t think the double buffer is going to help much for CPU (in-fact it will increase render time). It might help the tearing. Locking the BitmapData wehile copying pixels and then unlocking after can help also. One advantage I see to using Sprites (at least for the main characters) is the ease of rotation, scale, etc. The downside is that as soon as you perform those operations, the vector renderer cuts in and slows things down. I think you can use the best of both worlds by blitting the tile backgrounds, particles, etc, but the using sprites (with tiles to animate) for your main characters. It is kind of like what the old Atari or Commodore programmers did. They had an insanely limited # of sprites (or none) so they saved them for the most important objects in the game.

  • September 23rd 2008 at 10:44 am

    Hi Jeff – I ran the tests again through the latest version of PixelBlitz and things are definitely getting better. PB by default operates a double-buffer (by way of the RenderLayers) which helped with tearing a little bit, but not enough – to truly fix that the renderer would have to keep track of a target display time, and split the draw operations across that time (so effectively drawing sprites 1 – 1000 in one frame, then 1001 – 2000 in the second). I don’t think this is worth implementing though as Flash timers have such terrible latency.

    I added some basic optimisations to stop PB from wasting resources copying pixels when they didn’t move, and this has made PB equal the speed of the pure AVM when a layer is idle, which is great.

    The issues I have with the Sprite / Blit combo is the collision system, as it puts much more work onto you *especially* if you then start rotating or alphaing the sprites. I think the essence was the ease with which people can animate those Sprites on the timeline, with fine-grained control over the timing. Something that requires trial and error when blitting. The way PB can extract the frames of a MC helps this a little (you can design on the timeline, then let PB worry about the display), but it doesn’t retain any of the animation logic you may have had (and doesn’t care about for example 1 image staying displayed over several frames), so I think it needs more work before it’s truly useful.

    Once I’ve added collision masks to PB I’m going to have to tackle this situation next, as it’s quite an important one – and I cannot assume that everyone who wants to use PB has the skills or patience to animate their sprites in a paint package (I know I sure as hell haven’t!)

  • September 24th 2008 at 7:10 pm

    “Rich, I don’t think the double buffer is going to help much for CPU (in-fact it will increase render time)”

    I don’t see how mate, and with my own experiments I’ve found it quicker. Blitting to a bitmap which isn’t attached to the stage is quicker than one that is, although granted if you’re working with dirty rectangles / damage maps etc. then double buffering can be a headache.

    Either of you tried playing with Event.RENDER yet ? I’m not sure if it’s a glorified bitmap.lock() for the whole screen, but it looks like it could have some potential

  • September 24th 2008 at 8:10 pm

    Squize – I tried it through a double-buffer and it helps, but the difference was really tiny.

    I’m tempted to try a custom event renderer next, but I think it might be more ballache than it’s worth!

  • September 25th 2008 at 4:39 am

    >>>
    I don’t see how mate, and with my own experiments I’ve found it quicker. Blitting to a bitmap which isn’t attached to the stage is quicker than one that is, although granted if you’re working with dirty rectangles / damage maps etc. then double buffering can be a headache.
    <<<

    I just meant that doing 2 entire sets of copy pixels is more work for the CPU than 1. That’s why I lock and unlock rather than blit off screen and then on. I have no proof it is better, I just thought it was =)

  • September 26th 2008 at 2:21 pm

    “I tried it through a double-buffer and it helps, but the difference was really tiny.”

    Yeah it’s not a huge saving, but when hitting something hard all these little things help.

    “just meant that doing 2 entire sets of copy pixels is more work for the CPU than 1.”

    It’s still only 1 blit. You have your two bitmaps, one is attached to the display list, the other isn’t. The one that isn’t is the one that you blit to. Next frame you just swap them ( ie the pointers to the bitmaps ) around so the first one is now being blitted to whilst the other is being displayed.

  • July 22nd 2009 at 8:05 pm

    Wow! Great article. Could come in handy soon for me.

  • Vontre
    April 29th 2010 at 5:30 am

    Interesting tests! I’m somewhat curious about double buffering techniques as I’ve had some very unusual results. I tried implementing a double buffer using two methods; one implementation was swapping the pointers between two bitmaps as the back buffer was being drawn, and the other used simply kept the pointers the same and used a full screen blit to move the back buffer onto the screen. Naturally I expected the pointer swap to be faster. I set up a test with 100 screen renders per frame so that any differences would be easier to spot. In this test the pointer method blew the double-blit out of the water. Then I moved to a “normal” test with just a single render per frame and got the opposite result. This didn’t make any sense so I tested it several times over and got the same result; with only one screen render per frame, the double blit was actually slightly faster than the pointer swap. Hmmm.

    After thinking about it I came up with a possible explanation. Perhaps when Flash is executing the pointer swap in script, all it is doing is literally writing that pointer and not touching the rendering engine until the frame is complete. But what if the process Flash undertakes when it detects that the pointer has changed is actually slower than blitting the information? This is the only thing I could think of that would explain my results.

  • July 25th 2010 at 1:07 am

    I’ve found somebody did the test. His result was Blitting actually pretty fast compared to native flash sprite and mc. here is the link http://fatal-exception.co.uk/blog/?page_id=14
    He also embed the .swf for us to test, and my result was Blitting was pretty fast indeed.
    Do you have any comment about this? I’m just curious which one is actually faster.

  • chris
    October 19th 2010 at 10:29 am

    >> Is “blitting” really that much faster the using normal Sprites? No, it isn’t. It does have less memory overhead and a “cleaner” feel to it, but it’s no speed demon in comparison.

    You’re doing something wrong, then. Native MovieClips and Sprite render with draw(). If you use copyPixels() instead, you should get at least 3x speed boosts.

  • November 30th 2011 at 4:33 am

    Hi Rich, this comment might be 3 years late, but people coming upon this post might be interested in seeing a very simple blitting engine I’ve written which is very very fast. It’s solved many performance problems for me. You can check out the demo at http://cenizal.com/blis/ and download the source / read about it on my blog post: http://blog.cenizal.com/?p=119. Keep up the great work!!

  • November 11th 2012 at 12:00 am

    Do you have an online demo ?
    Can you release the sources ?

    I have made a similar article http://www.yopsolo.fr/wp/2012/11/08/tutorial-blitting/ and i would like to run some test.

  • XfStef
    August 13th 2013 at 8:52 pm

    Hey man. This looks like a great article, but the testing and everything was done in 2008, back when flash apps for android and iOS wasn’t even heard of. Nowadays people that are looking into making games for smartphones and tablets could really use some help when it comes to developing in AS3. Maybe it’s possible that you update your experiments and focus on 30 and 60 fps for such devices ? It would be great. Thanks !

  • August 15th 2013 at 7:04 am

    Sorry man I don’t touch Flash any more.

Make yourself heard