Flex/Flash/Actionscript

Using BitmapData for Array manipulation in AS3

UPDATE: The test app has been updated to include two more tests: one for a hash table based approach, and one that uses a ByteArray approach. Looks like the ByteArray approach is the fastest, followed by the hash table, followed by the bitmap method. They’re all pretty close when you’re working with 10k or 100k records, but once you get into a million or more items then the ByteArray approach pulls ahead. Thanks to Alex Bustin for sharing his code for the ByteArray test.

Recently I was thinking about how to do some complex Array manipulation, and how to do it fast. I needed to take a large source Array and filter it two different ways, producing two new filtered arrays, each of which is made up of a subset of the values in the source array. Then I wanted to know which values of the first filtered array were unique to that array (did not appear in filtered array #2) and vice versa. I came up with a technique that uses the bitmap manipulation of the Flash player to accomplish these goals.

An example of the problem:
Let’s say I have a source array that looks like this:

Source Array
Item 1
Item 2
Item 3
Item 4
Item 5

Then I filter that array to produce a new array, Filtered Array #1, that contains some but not all of the items. I do the same thing again, except this time using a different filter that produces Filtered Array #2. Now my data looks like this:

Source Array Filtered Array #1 Filtered Array #2
Item 1 Item 2 Item 1
Item 2 Item 3 Item 2
Item 3 Item 5 Item 3
Item 4 Item 4
Item 5

I want to know the unique values in Filtered Array #1 that do not appear in #2 (Item 5) and the unique values in Filtered Array #2 that do not appear in #1 (Item 1, Item 4). An added bonus would be some extra stuff like being able to know the union of both arrays (items that appear in either array 1 or 2) or the intersection (items that only appear in both filtered arrays).

The slow-ass way
So the first thing that comes to mind when trying to solve this is that you can simply do a for loop over the source array and check if each item appears in Filtered Array #1 and Filtered Array #2. This means you have one full loop over every item in the data set, and then you need to do a check for whether the item exists in each of the filtered arrays. The most basic way would be something like this:


var unique1:Array = [];
var unique2:Array = [];
				
for(var i:int=0; i

So that's the easiest way and it certainly works. I'm sure there are countless ways to do this in a more optimized way, like not looping over the source array but instead looping over the first filtered array, somehow marking all the items in the second filtered array if they were found in the first filtered array, then looping over the remaining items in the second filtered array. I'm sure someone will tell me that I should use a binary tree search since I know the source index, or that I should use a hashtable to lookup the existence of items faster. I bet all those different approaches would beat this simple example speed-wise, but I was still worried no matter how optimized an approach like this can get, it would still suck. (Side note: If you have an alternative approach that will handle many items with really fast performance, especially if you have an AS3 implementation, I'd love to hear about it.) So anyway, I decided I wanted to try something different.

Using bitmaps
Check this out. Say you have a source array of 10,000 items. Then you filter that down to a filtered array that might have a subset of about 4,000 of those items. The relationship of the items in the filtered array to the source array can be drawn using a bitmap like this:
screenshot024.png

This image is a 100 pixel by 100 pixel square, and each pixel within that square represents whether or not the item in the source array appears in the filtered array. So if the pixel at 0,0 is black that means that the first item in the source array is also in the filtered array. If a pixel is red, that means that that item does not appear in the filtered array.

Why did I choose to make a square image? I originally just did a single line 1px high. But Flash has a bitmap limitation of a max width or height of 2,880 pixels. So a single line means that you would be limited to arrays with only 2,880 items, and that's not the kind of problem I was trying to solve. The performance need arose from having to do this with a shitload of items. So if we use a square instead (easy to calculate the dimensions, just take the square root of the # of items), then we have a theoretical maximum number of items of 2,880 x 2,880 = about 8.3 million. I'm not going to try to filter more than 8 million items client-side, so I'm ok with that limitation.

OK, cool. Now we do the same thing for the second filtered array. Below is the generated image of both filtered arrays side by side. The red image represents Filtered Array #1 and the green represents Filtered Array #2:
screenshot026.png

But how do you create these images in an optimized way? Since the arrays are already being filtered, that means I'm using Array.filter() with a callback function to decide whether or not the item passes the filter test. This filter function is going to be run on every item in the array. So I use that function to also do my bitmap drawing. And check this out, the callback for a filterFunction tells you the index of the item in the array! So that gives you all the information you need to draw these images. You know that the function will get called one time for each item, and it tells you the index. So just do a simple BitmapData.setPixelAt() and once your filtering is complete your image has been generated. Dope.

So now we start with the real bitmap magic. We're going to use the compare() method of the BitmapData class to generate a third image that represents a pixel by pixel comparison of these two images. To use the compare() method you call it on one BitmapData and pass in a second BitmapData. The function will return a new BitmapData object that contains pixel comparisons. If the pixel in bitmap 1 is the same as the pixel in bitmap 2 then the new comparison bitmap will have a transparent pixel (0x00000000). If they are not the same, then it will return the "difference pixel", which contains the difference between each channel (RGB). So what happens when we run compare() on these two images? We get a new image something like this:
screenshot027.png

There are only a few possible pixel values now, and each unique pixel value tells us something special. So by looping over the pixels we can know whether the item occurs in both arrays, one array or the other, or neither array. And just like that, we got our answer.

Try out the demo application I created here. And you can view the full source here. Be forgiving, the source code is dirty as hell... no comments, no nothin'... I basically just cranked this out and decided to put it out there without taking the time to clean it up because a) I think it's a pretty cool approach and b) it's dinner time and I'm hungry.

Performance
So what about some benchmarks? I've put together a little demo app that you can play with to try out this method. It lets you run two algorithms, one that is a very simple for loop that loops over each item and calls array1.indexOf(item) and array2.indexOf(item) to see if the item is present in both filtered arrays. Then you can also run the bitmap test that I've described. Here are the results that I had on my machine when running both techniques on various numbers of items:

# of items For loop text Bitmap test
10 items 4 ms 4 ms
100 items 4 ms 4 ms
1,000 items 19 ms 7 ms
10,000 items 1300 ms 32 ms
100,000 items timeout 350 ms
1 million items timeout 3700 ms

These aren't meant to be scientific results. There are some pretty big factors that might influence speed. For the tests I have a filter function that just chooses at random (with 50% probability) which items to include in the two filtered arrays. If you change the number of items in either array the speeds will fluctuate. But the basic idea is that doing a for loop and checking for an item using indexOf fails miserably when you start getting over the tens of thousands of items. The bitmap method works even on a million records, albeit slowly at 3-4 seconds, but at least it works without timing out or crashing the browser.

And like I said before, I'm sure there are other algorithms that would be faster than looping over every item in the full array and checking using indexOf. If you have some implementations that solve the same problem but with better performance, please let me know in the comments.

So what?
I don't really know if I'm going to actually use this technique anywhere, but I might. I'm also thinking about performing other calculations on large datasets using similar methods. Essentially if you have two or more datasets, you can use this graphical technique to perform all kinda of other calculations. I'm only using the compare() function to test for a few very distinct possibilities, but you could imagine doing something much more detailed, like performing calculations based on the aggregate color difference of multiple array images. The bitmap manipulation in Flash is so damn fast, that it opens a whole new door if you try to think about using those methods for number crunching and not just pretty graphics.

P.S. I know it's been a long time since I did a blog post with code. I think this one geeks out to a pretty extreme level, so for anyone reading who was getting fed up with my non-technical ramblings, I hope this makes up for it.

Standard

23 thoughts on “Using BitmapData for Array manipulation in AS3

  1. Holy crap man, that is awesome. I’m impressed you even thought to try it, not to mention the fact that it seems to work really well.

    I would assume you’ve already thought about this, and I know next to nothing about it, but this sounds like the kind of thing Pixel Bender in FP10 might be perfect for. They have also significantly increased the bitmap size limit in FP10: http://www.bit-101.com/blog/?p=1426

    Nice work dood.

  2. Delighted to see you nerding out again, that’s why I subscribed to this blog 🙂

    I played around with the 10,000 item test a bit. I get ~1200ms for your for-loop method, but I can get it down to ~900ms by initializing union as unique1.concat(unique2) and get it down to ~400ms by sorting the full array before filtering and using the startIndex argument in array.indexOf. Your bitmap method still wins by a factor of 12.5 though!

    I think that better ways to do this with array functions might come from having meaningful filter callbacks at the start, but it’s such a significant speed boost that it seems like it would be really hard to beat.

  3. @Tom- good point about the union just being the concatenation of the two unique arrays. I could also take out the union stuff from the bitmap method as well then, so that gets rid of a call to Array.push in there too, which might give a small speed boost to the bitmap method.

  4. I made a mistake there, it should be the concatenation of all three arrays – unique1.concat(unique2).concat(intersection). I added a bit of code to make sure I was getting the same results for each method 🙂

    Of course, then I had to have one last try. If I build a hash of the object labels when they are filtered, I can get an equivalent to the for-loop method to work in ~20ms. The hashing adds an extra ~25ms to the filtering, so this clocks in at ~55ms, to be fair.

    But! If you do all this with typed classes instead of dynamic Objects, hashing while filtering only adds ~7ms, and calculating the uniques, intersection and union takes 13ms, for a total of 20ms.

    I’m excited about bitmaps too, but this is a good lesson in hashing!

  5. Somehow I knew exactly what you did just by reading the title of the post. It makes perfect sense to me to use Flash for these kinds of computations. After all, graphics is one of the things the flash player does best. As much as I understood exactly what you did just by reading the title, I’m still stumped I hadn’t thought of using bitmap data like this before. Awesome technique, thanks for sharing!

  6. FWIW, the hashing method I just posted takes ~500ms to do 100000 items and ~25000ms to do 1000000 items. Clearly, the flash hashing routines aren’t really tuned to hundreds of thousands of objects. So until someone figures out what tweaks are needed, the bitmap method definitely has legs!

  7. Holy Wow, man! How on earth did this idea even pop in your head. This is really outside the box. Pixel Bender should be able to chew stuff up like this pretty well, as Ben Clinkinbeard mentioned, but barring that, the performance boost in your method is pretty darn impressive! So, here’s the question… how do you propose to create the pixel images out of a real world dataset (ie Table, CSV, XML, etc)?

  8. @Brian – I was really just curious if this bitmap method would work and if it would perform well. And I think my mind thinks visually more than numerically, so the images of the arrays represented as pixels and the comparison of those just sort of stuck in my head.

    That said, I’ve just tried a very simple hashtable method (I’ll be posting a followup in a second) and it outperforms the bitmap way. So using bitmaps isn’t the fastest way to solve the problem, but it’s certainly the most fun.

  9. Rad says:

    Thanks for sharing cool idea.

    Another option to consider is to encode filtered arrays as a bitarray and then use bitwise operations. Using your data as an example

    Filtered array can be encoded as:
    f1=01101
    f2=11110

    then

    ~f1=10010
    ~f2=00001

    then answer to “unique values in Filtered Array #1 that do not appear in #2 (Item 5)” would be:

    f1 & ~f2
    01101
    00001
    ——-
    00001.

  10. @Alex – nice! I’m going to use that bytearray method and put together another side-by-side test. My tests indicate that it’s about the same speed as the others on the 10k and 100k tests, but when you get into the 1 million items test it shaves off a solid second.

    The test you posted doesn’t calculate the processing time in the same way my tests do though. Your test doesn’t account for the filtering time in the total duration, and since a big chunk of the work required for the algorithm to run (calling byteArray.writeByte) happens in the filtering function, that needs to be included in the duration benchmark as well. Once that gets thrown in the numbers look more like the other tests (but like I said, when you get into millions of records the bytearray method pulls ahead)

  11. @Alex – I updated the test app to include a test that uses ByteArray. I also modified your code to use writeBoolean() and readBoolean() instead of writeByte() and readByte(). I did brief testing and I think that squeezed a little more speed out of it (but I could be wrong). I also didn’t create the comparison ByteArray like you had in your code, since I don’t think that was really needed.

  12. I wanted to imitate your BitmapData method thus the three ByteArrays. So yes, I’m sure it could be optimized ;).

    Anyway, glad to have added to the pot. See you at MAX!

  13. Seems like a sparse matrix could be useful in this type of solution. But, I wonder 1) if there’s an Actionscript implementation anywhere, and 2) how it’d fare in this test regarding speed 3) assume just a couple of on/off state of what you did with the bitmapdata. Maybe not, though.

  14. mzx says:

    nice catch. There are some “optimized” pieces in flash player that works better than other. Images/filters likely uses some parallel calculations

  15. this is great i was looking exactly for this kind of benchmark before continuing with my project,
    i only need comparison to tell me if 2 bytearray vars are the same, no reunion intersection.
    do you think bytearray.compress before compare would speed up ? i have flash player 10
    and all tests were faster for me using bytearr and not bitmapdata

Comments are closed.