i am having issues with speed of communication between workers in AS3 coding for AIR for android. my test device is a Galaxy S2 (android 4.0.4) and i am developing in flashdevelop using AIR18.0.
first things first. i tried the good old AMF serialisation copying via shared object. i was getting smack average 49 calculations/second on the physics engine (the secondary thread) with a stable 60FPS on main thread. had to crank it up over to over 300 dynamic objects to get any noticeable slowdown.
all went well, so i started the on-device testing and that is when shit started to go sideways. i was getting less than 1.5 steps/s.
started to dig a bit deeper, write a shitton of code to check what the hell is so slow and i found that looking at shared objects was kinda like watching other people watching paint dry.
at this point i started to get deeper into researching. i found that there are a number of people already complaining about the speed of message channels (found not much on shared objects, "developers" status quo i guess). so i decided to go the lowest i could using shared bytearrays and mutexes. (i skipped over condition since i don't particularly want any of my threads to pause).
cranked up the desktop debugger i was getting 115-ish calculations/s and over 350 calculations/s with direct callback (the debugger did throw the exception, wasn't designed for that kind of continuous processing i guess.. anywho..). shared bytearray and mutexes was as advertised, faster than the orgasm of my ex girlfriend.
i do the debugging on the S2 and behold, i get 3.4 calculations/s with 200 dynamic objects.
so.. concurrency on mobile was pretty much done for me. then i thought i do a little test with no communication whatsoever. same scene, physics doing a more than acceptable 40 calculations/s and graphics running at the expected 60FPS...
so, my bluntly evident question:
here is my Com code:
package CCom
{
import Box2D.Dynamics.b2Body;
import Box2D.Dynamics.b2World;
import flash.concurrent.Condition;
import flash.concurrent.Mutex;
import flash.utils.ByteArray;
import Grx.DickbutImage;
import Phx.PhxMain;
/**
* shared and executed across all threads.
* provides access to mutex and binary data.
*
* @author szeredai akos
*/
public class CComCore
{
//===============================================================================================//
public static var positionData:ByteArray = new ByteArray();
public static var positionMutex:Mutex = new Mutex();
public static var creationData:ByteArray = new ByteArray();
public static var creationMutex:Mutex = new Mutex();
public static var debugData:ByteArray = new ByteArray();
public static var debugMutex:Mutex = new Mutex();
//===============================================================================================//
public function CComCore()
{
positionData.shareable = true;
creationData.shareable = true;
debugData.shareable = true;
}
//===============================================================================================//
public static function encodePositions(w:b2World):void
{
var ud:Object;
positionMutex.lock();
positionData.position = 0;
for (var b:b2Body = w.GetBodyList(); b; b = b.GetNext())
{
ud = b.GetUserData();
if (ud && ud.serial)
{
positionMutex.lock();
positionData.writeInt(ud.serial); // serial
positionData.writeBoolean(b.IsAwake); // active state
positionData.writeInt(b.GetType()) // 0-static 1-kinematic 2-dynamic
positionData.writeDouble(b.GetPosition().x / PhxMain.SCALE); // x
positionData.writeDouble(b.GetPosition().y / PhxMain.SCALE); // y
positionData.writeDouble(b.GetAngle()); // r in radians
}
}
positionData.length = positionData.position;
positionMutex.unlock();
}
//===============================================================================================//
public static function decodeToAry(ar:Vector.<DickbutImage>):void
{
var index:int;
var rot:Number = 0;
positionData.position = 0;
while (positionData.bytesAvailable > 0)
{
//positionMutex.lock();
index = positionData.readInt();
positionData.readBoolean();
positionData.readInt();
ar[index].x -= (ar[index].x - positionData.readDouble()) / 10;
ar[index].y -= (ar[index].y - positionData.readDouble()) / 10;
ar[index].rotation = positionData.readDouble();
//positionMutex.unlock();
}
}
//===============================================================================================//
}
}
(disregard the lowpass filter on the position y-=(y-x)/c)
so. please note that having the mutex only on the parsing of the physics does increase performance by about 20% while having minimal impact on the framerate of the main thread. this leads me to believe that the problem does not lie in the writing and reading of the data per say but in the speed at which that data is made available for a second thread. i mean,.. those are bytearray ops, it's only natural that it is fast. i did check the speed by simply dumping the remote thread into the main, and the speed is still sound. hell,.. it gets acceptable even on the S2 without dumping the extra calculations.
ps: i did try release version too.
if no one has a viable solution (besides a .2-.4s buffer, and the obvious single thread) i do want to hear about wanky workarounds or at least the specific source of the problem.
thx in advance
Think I found the issue. As always things are more complex than one initially thinks.
Timer events, as well as set interval and timeout are all limited to 60fps. The timer does execute on time as long as the app is idle at that particular point or IMMEDIATELY after it is free to execute and the delay has passed. But the delay, obviously, can't be shorter than 15-ish (and its less on desktop, I guess). Shouldn't be a problem, right?
However.
If that piece of code manipulates shared objects the timer suddenly decides to shit himself and look at it for those 15ms regardless if it had its idle time or not.
Anyhow, the thing is that there is an buggy interaction between shared objects, workers, timer events and the adobe imposed 60FPS limitation.
The workaround is quite simple. Have the timer on some massive delay of like 5000ms and do like 5000 loops within the callback of the timer event. Obviously, the next timer event won't fire until the 5000loop is completed but most importantly it also won't add that monumental delay.
Another weird thing that came up is the greedy ownership of mutexes during the 5000loop so the usage of flash.concurrent.Condition is a must.
The good thing is that the performance boost is there and its impressive.
The downside is that the entire physics thing is now intimately locked to the framerate of the main thread (or whatever contraption the main game loop consists of), but hey. 60Fps is good enough, I guess.
Zi MuleTrex-Condition thing for those interested:
package CCom
{
import Box2D.Dynamics.b2Body;
import Box2D.Dynamics.b2World;
import flash.concurrent.Condition;
import flash.concurrent.Mutex;
import flash.utils.ByteArray;
import Grx.DickbutImage;
import Phx.PhxMain;
/**
* shared and executed across all threads.
* provides access to mutex and binary data.
*
* @author szeredai akos
*/
public class CComCore
{
//===============================================================================================//
public static var positionData:ByteArray = new ByteArray();
public static var positionMutex:Mutex = new Mutex();
public static var positionCondition:Condition = new Condition(positionMutex);
public static var creationData:ByteArray = new ByteArray();
public static var creationMutex:Mutex = new Mutex();
public static var debugData:ByteArray = new ByteArray();
public static var debugMutex:Mutex = new Mutex();
//===============================================================================================//
public function CComCore()
{
positionData.shareable = true;
creationData.shareable = true;
debugData.shareable = true;
}
//===============================================================================================//
public static function encodePositions(w:b2World):void
{
var ud:Object;
positionData.position = 0;
positionMutex.lock();
for (var b:b2Body = w.GetBodyList(); b; b = b.GetNext())
{
ud = b.GetUserData();
if (ud && ud.serial)
{
positionData.writeBoolean(b.IsAwake); // active state
positionData.writeInt(ud.serial); // serial
positionData.writeInt(b.GetType()) // 0-static 1-kinematic 2-dynamic
positionData.writeDouble(b.GetPosition().x / PhxMain.SCALE); // x
positionData.writeDouble(b.GetPosition().y / PhxMain.SCALE); // y
positionData.writeDouble(b.GetAngle()); // r in radians
}
}
positionData.writeBoolean(false);
positionCondition.wait();
}
//===============================================================================================//
public static function decodeToAry(ar:Vector.<DickbutImage>):void
{
var index:int;
var rot:Number = 0;
positionMutex.lock();
positionData.position = 0;
while (positionData.bytesAvailable > 0 && positionData.readBoolean())
{
//positionMutex.lock();
index = positionData.readInt();
positionData.readInt();
ar[index].x = positionData.readDouble();
ar[index].y = positionData.readDouble();
ar[index].rotation = positionData.readDouble();
//positionMutex.unlock();
}
positionCondition.notify();
positionMutex.unlock();
}
//===============================================================================================//
}
}
Sync will become a lot more complex as more channels and byteArrays start to pop up.