NOTE: If you read this post then please also read the comment at the end from sHTiF (creator of Genome2D) who details some poor assumptions/misunderstandings on my part.
When writing my Monster Dash clone, I opted to start by using Flash’s classic display list. Although my intention was to eventually swap over to the Starling Framework, I knew I’d be able to knock together test animations and do all my graphics layout within Flash Professional much faster than I could programmatically in pure ActionScript. Of course, since I was targeting iOS I knew at some point I’d hit a GPU limit using the classic display list that only Starling (and by extension Stage 3D) would be able to overcome.
The first few iterations went pretty much as expected. Using Box2D I built an increasingly sophisticated game world and quickly added visuals to represent it using traditional Movie Clips and Flash’s timeline. With the help of Adobe Scout I was able to measure the performance of each build and could see that rendering times on my iPad 1 were inching up as I applied more visuals. On the other hand, while GPU rendering was taking up a large proportion of each frame’s budget, my ActionScript was consuming a very small percentage of the allocated budget.
You can see this in the screenshot below, which is captured from Adobe Scout. The vertical slices represents the time taken to render each frame. The green bars represent the time taken to render my visuals, whereas the blue bars represents the time taken to execute my ActionScript. To achieve my target frame rate of 60 frames per second, each slice (a blue bar stacked on top of a green bar) needs to stay below the red horizontal line. If a slice pops up above the red line then my game has failed to perform all the work required of it within its allocated frame budget. When this happens your application’s frame rate drops and the user notices a visible stutter. As you can see from the screenshot, my demo was close to bursting point with my rendering times simply taking too long to consistently stay within budget.
At this point I knew it was time to ditch Flash’s classic display list and move over to Starling. After all, with my ActionScript execution times already so low (on average around 3ms per frame), I’d surely be able to get my overall frame execution times drastically down by simply plugging in Starling and reducing the rendering times, right? Well, unfortunately by dropping Flash’s classic display list I actually hit a new problem. Take a look at the screenshot below, which was taken from Scout after I’d implemented all my visuals in Starling.
As you can see, as expected, the green bars are gone! Stage 3D is so blazing fast that the rendering times for each frame are now negligible. However, the execution time for my ActionScript has shot way up (around 14ms per frame compared to 3ms previously)! In fact, my overall frame budget times aren’t much of an improvement over my original classic display list approach at all. But why has this happened? Well, you’ve got to remember that the classic display list APIs are native to the Flash runtime. Therefore there’s almost no cost involved in calling them within your ActionScript. Starling’s API however is actually written entirely in ActionScript 3. Therefore each equivalent call to the Starling API results in layers of ActionScript being executed before the bulk of your graphics work is offloaded to the GPU.
Thanks to Adobe Scout I was eventually able to optimize things so that everything ran within budget on iPad 1 but I only just managed to scrape it. But the harsh truth is that the GPU savings I made by moving to Starling were almost completely blown away by the ActionScript overhead.
Now this doesn’t necessarily mean I should have just stuck with the classic display list. There were some things I didn’t bother implementing using the classic display list, such as the large scrolling backgrounds (you can see a video of an early prototype using the classic display list here). They would have simply crippled my frame rate, whereas Starling was able to effortlessly handled them. I’d also like to point out that the Starling framework itself isn’t to blame either. With Stage 3D being such a low-level abstraction layer, a fair chunk of ActionScript is required when writing higher-level frameworks that sit on top of it. The same is unfortunately true for all other frameworks built on top of Stage 3D such as Away3D, Feathers, Minko etc.
All this does however highlight that while Stage 3D has eliminated Flash’s previous rendering bottleneck, it has left ActionScrpt 3’s performance badly exposed. In recent years we’ve even come to see ActionScript 3 fall behind JavaScript in terms of performance.
So what can be done about this? Well I guess I could target better hardware. The iPad 1 is obsolete by today’s standards and my iPad 2 is actually able to run my Monster Dash clone without even breaking sweat, but I guess that’s not the point. I think Flash developers should expect to run simple 2D games with some basic physics on even iPad 1 and similar Android hardware at a full 60fps. If I was to add additional complexity to my Monster Dash clone then I’d likely run into performance issues again. I accept that my code was hardly the most optimised I’ve ever written but it certainly isn’t the worst either.
The reality is that ActionScript 4 was going to solve all these issues. Now that Adobe has abandoned these plans it leaves Flash and AIR in an increasingly difficult spot. I wouldn’t mind so much if we were to see some performance improvements to ActionScript 3, but if such plans exist then Adobe are keeping it quiet. However it’s highly likely that ActionScript 3’s roots in EcmaScript/JavaScript simply make it impossible to squeeze significantly more performance from the language.
Personally I think the solution is to make some of the more common frameworks such as Starling native by bringing them into the Flash runtime. Considering the increased focus on Starling development and the constraints of mobile hardware, this makes a lot of sense. When using Starling, the traditional workflow of using Flash Professional disappears somewhat, leaving developers with a more code-centric approach. If that’s the case then many will argue what’s the point of using Flash and ActionScript, especially when alternative languages such as Objective-C or C++ can provide significantly better performance. Even Flash’s claim of being a cross-platform solution has been mostly eroded over the last few years.
While Adobe has taken some significant strides with the Flash platform in recent years there are some areas that need some serious consideration. At this moment in time I think ActionScript 3 performance has to be one of the highest priorities.
It’s not a pretty solution, but you actually can use Objective-C, C, or C++ from your AS3 app on AIR using Native Extensions (a.k.a. ANE). Depending on the type of work your AS3 is doing, you may get an appreciable performance boost by offloading some of the work to an ANE. You cite AS3 falling behind JavaScript which prompted me to use a similar technique by offloading to JavaScript. I got a 10x speedup:
http://jacksondunstan.com/articles/2213
as3 is absolutely a bottleneck. I heard it got even worse with asc 2.0. However I wonder if the use of workers could help a bit.
Thanks for the performance suggestions Jackson! Your AS3-to-JavaScript bridge suggestion when running in the browser is really interesting.
@sebastiano It’s a shame workers don’t work on mobile.
This is rather uninformed post, few issues with it:
First the overhead for native display list execution is inside the native display list drawing so what you see as “drawing” in the first frame also contains exection overhead.
Another major issue that is overlooked here is that most of the execution in the second screen is actually allocated for GPU calls and has nothing to do with ActionScript speed it would be as slow in other languages as well. For example upload to GPU calls are counted here.
In Genome2D game I did I had actually 98% of the actionscript execution time allocated for these calls and the other 2% was actionscript logic. Starling has way more AS3 overhead than Genome2D but I would bet that most of the time there is still allocated for GPU calls.
Thanks for clarifying these issues sHTiF. Really appreciated!
You may also be interested to look at the C# Mono + AS3 Stage3D bits Zynga recently posted and has been updating. https://github.com/playscript/playscript-mono.
You don’t mention where the Actionscript time is being spent. Did you use Scout to profile this code and maybe find opportunities to optimize it?
Also, may I suggest that frame rate is not the only performance issue you need to be concerned about. The Scout data shows that you are achieving your desired frame rate, with both the original and Starling-based version, but you are using almost 100% of the CPU in both cases. This is really going to drain the battery!
I think you would benefit a lot from switching to Nape physics instead of Box2D. Box2D was not optimized for AS3, Nape is… I saw a drastic performance improvement both on desktop and mobile.
Per sHTIFs comment If those are calls made to the GPU it seems like it is not being used correctly. Calls to the GPU should be limited as it is by reusing resources.
It’s things like this that are making me look at Haxe (specifically NME) for my next project.