Some flexcoders stats

I’ve recently compiled a fairly complete database of all the messages ever sent to the flexcoders mailing list. I’ll be posting the sqlite database file that you can load into your own AIR applications to start playing with this data. But before anything else I thought I’d post a few fun tidbits:

Top 10 Posters

Poster # Messages # unique threads
Alex Harui 3259 2172
Matt Chotin 2793 2153
Tracy Spratt 2522 1886
Tom Chiverton 2368 1559
Manish Jethani 1296 1004
Gordon Smith 1371 978
JesterXL 1216 702
Abdul Qabiz 833 614
Michael Scmalle 904 513
Tim Hoff 798 470

Longest Threads

Subject # Messages # unique posters
Splitting FlexCoders into smaller, focused groups 130 28
Will Microsoft’s Silverlight Player Kill our beloved Flex? 127 48
Flex 1.5 price 102 36

[DISCLAIMER: I have not verified this data at all. For all I know I fucked something up in the scraping process and it’s all whack. Who knows.]

Method for gathering data
I wrote an AIR application that scrapes the mail archive site for a specific group. This pulled all the messages that are available on the flexcoders archive of mail archive. (to get this listing I did a search that encompasses the entire date range I wanted to get paged results for all the messages). Unfortunately I can only get pages of 10 messages at a time from the mail archive. And there are over 96,000 messages in the flexcoders archive. And I didn’t want to hammer the site to total death and wanted to respect the request to not request more than one page a second that’s in the mail archive FAQ. So how long does downloading 96,579 messages take if you download 10 at a time once a second? About 3 hours more or less.

Oh, and it turns out that the HTML in the mail archive pages isn’t valid XHTML, so parsing it can be a bit of a bitch. So I managed to get about 20,000 messages in when I ran into a parsing error that halted the whole process. After about 3 different tries I finally got all the way through.

[DISCLAIMER 2: The mail archive site lists something like 96,000 messages for flexcoders, but the yahoo group seems to have quite a bit more, more like 116,000. Why the 22k difference? I don’t know. Something’s whack, but I’m guessing that what I got is not the full 100% complete version of the messages. But hey, it’s the best I got right now.]

Why the hell?

We recently had a long discussion on flexcoders about whether or not the list should be split into multiple smaller lists. Notice the longest thread of all time in all of flexcoders history? Yeah, it’s that thread. So we debated back and forth and everyone seemed to have their own opinion. There were some assertions made about the stats of the list in terms fo # of people posting and losing previous subscribers. So I decided to get a database of all messages so I could try to figure some of that stuff out. So gathering the data was step 1. Now step 2 is using the data to try to figure out if people are in fact dropping off the list, and whatever other interesting tidbits I can glean out of it. I figure it’s a good way to play with some data visualization techniques.

What’s next with the data?
I’m going to post the sqlite DB file for anyone to download. The complete DB file is 90 megs. I’m also thinking about removing the “excerpt” column (which contains the first part of the complete message) because I assume that will drop the size considerably. I’m going to figure out whether I just want to post the 90 meg thing on my website or if I want to try to offload that somewhere, although I imagine there are only a few people who wpould be interested in downloading this thing anyway 😛

I’m also going to play with doing some visualizations of this data now that I have it in an AIR app. I’ll probably just take screenshots of what I come up with, seeing as to actually run this stuff you’d have to download a frickin 90 meg air file, which seems a little excessive. Or maybe I’ll give people a 90 meg AIR app, fuck it right?

I hope to post the sqlite database file over the weekend. Then as I have time I might start playing with the data and posting my results.


13 thoughts on “Some flexcoders stats

  1. Stats are jacked. I loved that list in 2004 and left about early 2006; I think I’ve posted like 3 times since. I still vote for an advanced list OFF OF F#$ING Yahoo. Flex Components was kind of cool, but too focused. Something like Flash Tiger would be dope.

  2. @Jesse – these are numbers for the entire life of flexcoders, so 2004-current. So you may have completely stopped posting in 2006, but your barrage of posts from 2004-2006 still leaves you in the top 10.

    @Michael – I’d love to put the data somewhere like that, but Many Eyes limits all uploads to 5 megs. I don’t actually know what the total size of a CSV file would be once I export it (I’ll try it and find out soon) but I have a feeling that 96k records is going to be over 5 meg, but maybe not. I was also looking at Dabble DB and hoping to use that, but they restrict to a certain number of rows.

  3. I love what you are wanting to do with this.

    You can host it on my site if you want, I have unlimited space/bandwidth.
    Let me know if you need any help with anything.


  4. Jay says:

    90 mega? Goodness, we’d be happy to download even it if it was 9 giga. Just give it to us. please!

  5. The difference in the number of messages between Yahoo! Groups and the Mail Archive is due to the fact that I wasn’t able to import all of the messages into the Mail Archive. If I remember correctly, importing the messages was a tedious process and I sat down one night and did as much as I could manage using a Perl script I had put together that day.

    Here’s the point where we switched over to the Mail Archive:


    Good to see how far things have come. I’m no longer active but am still lurking and do post now and then.

  6. @Manish I was about to mention the same, but you already made the point. Archiving to mail-archive was really good thing you did, it really helps.


Comments are closed.