Data Visualization, Flex/Flash/Actionscript

Stalking Someone with Data

Data can often tell you far more about people than you originally think. In my previous post I presented some of the data from the history of the FlexCoders mailing list. I showed some of the details of the individual usage patterns for different people. One of those people was the Flex product manager, Matt Chotin. Matt’s involvement with FlexCoders is pretty interesting if you start to dig into the data. In this post I’ll try to identify some changing trends in his usage patterns and we’ll see if we can do some detective work to figure out why his behavior changed.

A little background: Matt has been involved in Flex since basically forever. He was an engineer at Macromedia and is now the product manager for Flex. Matt has been quite prolific on flexcoders over the years (in the overall ranking he’s #3). So to start I was interested in his overall post volume on the list. Take a look at the timeline showing his posts per month and you’ll notice there’s a distinct drop-off:


Here’s a closeup of a period:


See that big drop from April to May of 2006? Well in May Matt changed jobs to become the product manager of Flex. On his blog he noted:

So if you notice the number of flexcoders posts going down it’s because my brain will be slowly atrophying as I move away from the details of our vast offering.

And that’s exactly what happened.

Daily routines

Seeing the correlation between a change in professional life and a drop in activity is cool, but we can dig deeper. Not only is this data telling us when Matt changed his behavior throughout the year, but we can also figure out something about his daily routines and how that changed as well. I started looking at when (as in what time of day) Matt was posting to the list.

Here’s a chart that shows the distribution of posts by hour of day and day of week. It groups the posts by the combination of what day and what hour they occur on.


So you can see that Matt posted the most on weekday mornings (around 9-11am on Monday-Friday) and weekday evenings (around 8-10pm Monday-Thursday, note that he rarely posts on Friday nights).

This pattern is actually very similar to Alex Harui’s activity as well, although Alex’s activity is more weighted to during work hours than at night (except for Sunday night!).

I found the evening hotspots interesting (both in Matt and Alex’s cases). Clearly Matt was answering people’s questions a lot after work hours from home.

I dug a bit further into Matt’s trends. Here’s the graph of his activity by hour of day for 2005:


We can see in 2005 he actually answered more questions in the evening than in the morning. Taking a look at 2006 this became even more pronounced, almost all his activity was at night (I wasn’t the only one who noticed this, see Ryan Stewart’s post about Matt posting at 9pm):


And then there was a change in 2007. The graph for 2007 shows that he started answering more question during the workday. And that shift continued into 2008 and 2009, by which time almost all of Matt’s activity was during work hours.


If you dig even deeper into the data you can find out that the transition from mainly evening activity to work-day activity happened mostly during the months of April 2007 – June 2007. After about July 2007 Matt almost primarily posts during the day. Taking a look at the release history of Flex, we see that the beta of Flex 3 came out in June 2007. So my guess is that Matt changed to a management role in May of 2006, but had far too much work to do to get Flex 3 ready and out the door between then and June 2007 (meaning his devotion to flexcoders had to be delegated to the evening hours). Finally once the Flex 3 beta was out the door he could devote some actual work hours to being involved in the community, instead of having to do it all from home.

Vacation Time

As if knowing the intimate details about Matt’s daily routine isn’t enough, we can learn something about his historical vacation time off as well. Matt’s impressive in that he’s never missed a month without posting. If you go even more granular there are actually very few weeks that he missed (as his overall activity declined in 2009 this became more common). So if we look at Matt’s activity around the holidays something interesting pops out (well, it’s only interesting if you’re a total stalker, but if you’ve read this far then you probably are). Here are a few timelines of different years, showing columns grouped by week. In 2005 we see Matt was posting pretty regularly through the holidays. There actually was a 5 day stretch with no posts, but that was it (due to the way the weeks are grouped that gap doesn’t show in this chart).

2005 is similar:

But then 2006 has a big gap:

And being the stalker that I am, I noticed that and then went to investigate further. Turns out Matt wrote about taking a vacation that year.

I’ll be on vacation until mid-January so emails to me will go unanswered as will responses to various forums and blog comments 🙂 Happy Holidays to all!

The data never lies.

Looks like a long vacation over the holidays didn’t turn into a regular thing though, since he was right back at it the following year:

I’m not a total nut job

I know it seems like I’m obsessed with Matt Chotin. And regardless of whether that’s true or not, I do want to assure people I’m not totally off my rocker. This little experiment in data mining and analysis isn’t really about Matt. It’s about the stories data tells about all of us. There are mountains of public information out there about us all, and the tiny little bits that we put out there, even if those are just little Facebook or Twitter status messages, can say a lot about us. Sure, a single Facebook status message doesn’t tell anyone much, but when you look at all of them over a multi-year period you can start learning a lot about a person. And often that information that the aggregate data tells about us isn’t something we’re aware of. From this data experiment I know when Matt eats dinner (pretty typical range of 6-8pm), when he goes to bed (around midnight), and when he gets to work (again pretty normal between 8-9). And this is all from only 4,000 data points. With social networking and microblogging sites we’re starting to create thousands of little data points like this all the time.

Thanks to Matt Chotin

I ran this post by Matt first, since I know it’s a bit creepy. He was cool with me posting it, so thanks Matt! And thanks for all the years of hard work answering questions on flexcoders, we’re a stronger community because of it.

The Data

Read more about the data here. This is 5 and a half years of mailing list activity, comprising about 148,826 individual email messages. Matt himself posted about 4,000 messages. You can download the full CSV dataset here.


10 thoughts on “Stalking Someone with Data

  1. Asa Williams says:

    Spatial Key is looking awesome! Good to see that both sides of law can use it (police and stalkers)

  2. What I find really interesting about this analysis in relationship to Spatial Key is that you have achieved all of this insight with a pretty simply column chart, and one specialized heat map. No need for specialized visualizations or even geo-coding. What I think goes unsaid here is how easy the tooling makes it to load and interrogate data – the real power in this platform is the ease by which users can flexibly analyze data, probably more so than the visual output.

  3. I used to work in Business Intelligence for a large telco. Digging through mounds of data to find some meaningful tidbits is always interesting. Being able to take the data, crunch it, make it look nice, and then pull out the tidbits with a little conjecture reminds me of just how powerful ‘meaningless’ data viewed with the right tool(s) can become.

  4. Chris says:

    Oddball reference: The new SyFY Channel show “Caprica”, which had an early 1-episode peek last year, works off the idea that an Artificial Intelligence (of a deceased daughter) is created out of her entire lifetime of posting/browsing/status updates/etc found across the internet. The daughter is then “resurrected”, and the AI is ported into a robot… to eventually become the first Cylon….

    Back to reality: Stalking based on data is one thing (a creepy thing)… but what about the idea of taking it in a different direction: assembling a simple A.I. that might be able to tell marketers about consumer likes/dislikes. Imagine a virtual Focus Group, made up of a series of A.I.s based on tracking data of web users… While not a complete substitute for real, live, humans in a focus group – the A.I. Focus Group could be a cheap, first reaction test to product features, or product changes.

    Question: Does Matt Chotin’s posting habits tell you whether he likes his spaghetti sauce thick, or watery ?

  5. I think Doug would have to scan a different forum for the spaghetti preferences 🙂 What I think could be an interesting analysis is the correlation between subjects and participants. Who are you bound to see responding to a thread if the subject contains X? It can be a clue into the proper way to format your subject if you want a response from someone specific.

  6. Gregor Kiddie says:

    I don’t think you need to wade through the data for that…

    You respond to anything that asks questions about process.

    Alex responds to anything to do with modules, garbage collection, the profiller or the marshall plan.

    Tracy repsonds to anything about XML

    Ben (used to) respond to anything about MDI

    That little list covers pretty much anyone you need to get information from 😉

  7. devo says:

    Matt Chotin’ calendar goes straight from March 31st to April 2nd; no one fools Matt Chotin.

    Matt Chotin once won a game of Connect Four in 3 moves.

    Matt Chotin runs Windows Vista on his Etch-a-Sketch.

    Matt Chotin does not sleep. He waits.

    Matt Chotin can do a wheelie on a unicycle.

    When the Boogeyman goes to sleep every night he checks his closet for Matt Chotin.

Comments are closed.