{"id":306,"date":"2008-06-27T17:11:20","date_gmt":"2008-06-28T01:11:20","guid":{"rendered":"http:\/\/dougmccune.com\/blog\/2008\/06\/27\/some-flexcoders-stats\/"},"modified":"2013-06-09T19:42:54","modified_gmt":"2013-06-10T03:42:54","slug":"some-flexcoders-stats","status":"publish","type":"post","link":"https:\/\/dougmccune.com\/blog\/2008\/06\/27\/some-flexcoders-stats\/","title":{"rendered":"Some flexcoders stats"},"content":{"rendered":"<p>I&#8217;ve recently compiled a fairly complete database of all the messages ever sent to the <a href=\"http:\/\/tech.groups.yahoo.com\/group\/flexcoders\/\">flexcoders mailing list<\/a>. I&#8217;ll be posting the sqlite database file that you can load into your own AIR applications to start playing with this data. But before anything else I thought I&#8217;d post a few fun tidbits:<\/p>\n<p><strong>Top 10 Posters<\/strong><\/p>\n<table>\n<tr>\n<th>Poster<\/th>\n<th># Messages<\/th>\n<th># unique threads<\/th>\n<\/tr>\n<tr>\n<td>Alex Harui<\/td>\n<td>3259<\/td>\n<td>2172<\/td>\n<\/tr>\n<tr>\n<td>Matt Chotin<\/td>\n<td>2793<\/td>\n<td>2153<\/td>\n<\/tr>\n<tr>\n<td>Tracy Spratt<\/td>\n<td>2522<\/td>\n<td>1886<\/td>\n<\/tr>\n<tr>\n<td>Tom Chiverton<\/td>\n<td>2368<\/td>\n<td>1559<\/td>\n<\/tr>\n<tr>\n<td>Manish Jethani<\/td>\n<td>1296<\/td>\n<td>1004<\/td>\n<\/tr>\n<tr>\n<td>Gordon Smith<\/td>\n<td>1371<\/td>\n<td>978<\/td>\n<\/tr>\n<tr>\n<td>JesterXL<\/td>\n<td>1216<\/td>\n<td>702<\/td>\n<\/tr>\n<tr>\n<td>Abdul Qabiz<\/td>\n<td>833<\/td>\n<td>614<\/td>\n<\/tr>\n<tr>\n<td>Michael Scmalle<\/td>\n<td>904<\/td>\n<td>513<\/td>\n<\/tr>\n<tr>\n<td>Tim Hoff<\/td>\n<td>798<\/td>\n<td>470<\/td>\n<\/tr>\n<\/table>\n<p><strong>Longest Threads<\/strong><\/p>\n<table>\n<tr>\n<th>Subject<\/th>\n<th># Messages<\/th>\n<th># unique posters<\/th>\n<\/tr>\n<tr>\n<td>\nSplitting FlexCoders into smaller, focused groups\n<\/td>\n<td>130<\/td>\n<td>28<\/td>\n<\/tr>\n<tr>\n<td>\nWill Microsoft&#8217;s Silverlight Player Kill our beloved Flex?\n<\/td>\n<td>127<\/td>\n<td>48<\/td>\n<\/tr>\n<tr>\n<td>\nFlex 1.5 price\n<\/td>\n<td>102<\/td>\n<td>36<\/td>\n<\/tr>\n<\/table>\n<p>[<strong>DISCLAIMER: <\/strong>I have not verified this data at all. For all I know I fucked something up in the scraping process and it&#8217;s all whack. Who knows.]<\/p>\n<p><strong>Method for gathering data<\/strong><br \/>\nI wrote an AIR application that scrapes the <a href=\"http:\/\/www.mail-archive.com\/\">mail archive site<\/a> for a specific group. This pulled all the messages that are available on the <a href=\"http:\/\/www.mail-archive.com\/flexcoders%40yahoogroups.com\/\">flexcoders archive<\/a> of mail archive. (to get this listing I did a search that <a href=\"http:\/\/www.mail-archive.com\/search?l=flexcoders%40yahoogroups.com&#038;q=date%3A[1900+TO+3000]\">encompasses the entire date range<\/a> I wanted to get paged results for all the messages). Unfortunately I can only get pages of 10 messages at a time from the mail archive. And there are over 96,000 messages in the flexcoders archive. And I didn&#8217;t want to hammer the site to total death and wanted to respect the request to not request more than one page a second that&#8217;s in the <a href=\"http:\/\/www.mail-archive.com\/faq.html#download\">mail archive FAQ<\/a>. So how long does downloading 96,579 messages take if you download 10 at a time once a second? About 3 hours more or less.<\/p>\n<p>Oh, and it turns out that the HTML in the mail archive pages isn&#8217;t valid XHTML, so parsing it can be a bit of a bitch. So I managed to get about 20,000 messages in when I ran into a parsing error that halted the whole process. After about 3 different tries I finally got all the way through.<\/p>\n<p>[<strong>DISCLAIMER 2: <\/strong>The mail archive site lists something like 96,000 messages for flexcoders, but the yahoo group seems to have quite a bit more, more like 116,000. Why the 22k difference? I don&#8217;t know. Something&#8217;s whack, but I&#8217;m guessing that what I got is not the full 100% complete version of the messages. But hey, it&#8217;s the best I got right now.]<\/p>\n<p><strong><br \/>\nWhy the hell?<\/strong><br \/>\nWe recently had a long discussion on flexcoders about whether or not the list should be split into multiple smaller lists. Notice the longest thread of all time in all of flexcoders history? Yeah, it&#8217;s that thread. So we debated back and forth and everyone seemed to have their own opinion. There were some assertions made about the stats of the list in terms fo # of people posting and losing previous subscribers. So I decided to get a database of all messages so I could try to figure some of that stuff out. So gathering the data was step 1. Now step 2 is using the data to try to figure out if people are in fact dropping off the list, and whatever other interesting tidbits I can glean out of it. I figure it&#8217;s a good way to play with some data visualization techniques.<\/p>\n<p><strong>What&#8217;s next with the data?<\/strong><br \/>\nI&#8217;m going to post the sqlite DB file for anyone to download. The complete DB file is 90 megs. I&#8217;m also thinking about removing the &#8220;excerpt&#8221; column (which contains the first part of the complete message) because I assume that will drop the size considerably. I&#8217;m going to figure out whether I just want to post the 90 meg thing on my website or if I want to try to offload that somewhere, although I imagine there are only a few people who wpould be interested in downloading this thing anyway \ud83d\ude1b<\/p>\n<p>I&#8217;m also going to play with doing some visualizations of this data now that I have it in an AIR app. I&#8217;ll probably just take screenshots of what I come up with, seeing as to actually run this stuff you&#8217;d have to download a frickin 90 meg air file, which seems a little excessive. Or maybe I&#8217;ll give people a 90 meg AIR app, fuck it right?<\/p>\n<p>I hope to post the sqlite database file over the weekend. Then as I have time I might start playing with the data and posting my results. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve recently compiled a fairly complete database of all the messages ever sent to the flexcoders mailing list. I&#8217;ll be posting the sqlite database file that you can load into your own AIR applications to start playing with this data. But before anything else I thought I&#8217;d post a few fun tidbits: Top 10 Posters [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[2],"tags":[43],"class_list":["post-306","post","type-post","status-publish","format-standard","hentry","category-flex","tag-flexcoders"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/posts\/306","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/comments?post=306"}],"version-history":[{"count":1,"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/posts\/306\/revisions"}],"predecessor-version":[{"id":1783,"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/posts\/306\/revisions\/1783"}],"wp:attachment":[{"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/media?parent=306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/categories?post=306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dougmccune.com\/blog\/wp-json\/wp\/v2\/tags?post=306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}