Uncategorized

Hackers and Depression: Inform Yourselves About CBT

My wife is a clinical psychologist. Over the past week we’ve had long discussions about Cognitive Behavioral Therapy (CBT), which is a certain type of therapy that is focused on using evidence-based methods (read: there have been studies showing effectiveness), with a particular emphasis on rational reasoning and pragmatic ways to tackle issues like depression and anxiety. The overlap with programming in terms of the way of thinking is astounding.

As a community, we rarely talk about mental illness. It takes high profile cases like Aaron Swartz’s suicide to get us to even bring up the subject, but more than likely we’ll revert back to our isolation and pretend like depression isn’t a serious issue in the tech world. We need to face depression, not sweep it under the rug.

If you or someone you care about is dealing with depression, please take a look at CBT. This article is a joint effort between a programmer (me) and a psychologist (my wife). I bet it’s the first article about therapy you’ve seen that uses code snippets to illustrate points.

CBT is for Hackers:

The tragedy to me is this: one of the most effective and scientifically-backed treatments for depression appears to be a stunning fit for hackers, and yet few people know about it. It’s called Cognitive Behavioral Therapy (CBT), and it has some of its origins in computer science.

Born out of the cognitive revolution of the 1950s, a key idea within cognitive psychology is that by studying successful functions in computer science, it becomes possible to make testable inferences about human psychological processes. Cognitive behavioral therapists mirror hackers in how they see the world and approach problems. They share the same core values: an emphasis on problem solving as efficiently and effectively as possible, using logic to debug a system, gathering data to test out what works and what doesn’t, and implementing transparent methods that others can understand and replicate as opposed to simply putting your faith in a “magic black box”. CBT and hackers are long lost kindred spirits, yearning to be reunited.

Read the full CBT is for Hackers article.

Data Visualization

Drug Side-Effect Warnings as Word Clouds

I’m always amazed at how happy the voice on TV commercials sounds when describing a litany of horrible-sounding side-effects from a prescription medication. I was just watching the nice cartoon lady telling me about Abilify when I overheard this:

<queue birds chirping> <nice music playing> Contact your doctor if you have uncontrollable muscle movements, as these could become permanent. Other risks include decreases in white blood cells, which can be serious, dizziness upon standing, seizures —

Errr, wait, what the fuck? Can we not have the nice guitar strumming along with a quaint melody in the background while you tell me that if I take this pill I might not be able to control my muscles and might start having seizures? Your soothing, monotonous tone is both putting me to sleep and freaking me out at the same time.

We’ve all heard these side-effect warnings in commercials, or seen them on packaging in the tiniest tiny print. It’s not uncommon to hear some soothing voice say something like “Side effects include headache, drowsiness, sore throat, and death.” Uhhhh. I’m ok with most of those, but one of these things is not like the other.

And yet the commercials or fine print don’t really tell you what’s likely to be a side effect versus what’s unlikely. Turns out, though, that if you do some research, the significance of the side effects of various prescription pills are available online. You just have to dig. For example, here’s the product sheet for Zoloft. And it has a section about side effects that looks like this:

Now we’re getting some real numbers. If only there was a way to quickly see what the most common side effects of various drugs were at a glance.

Here’s my take on redesigning the information presentation. We’ll start off with a fun one, which is the popular anti-depressant Zoloft:

Ain’t that a bitch? I guess the good news is if you’re nauseous, then ejaculation failure might not be that big a concern. The side-effects are sized by computing the difference in the percentages between the placebo group and the group taking the medication. In this case 14% of patients taking Zoloft experienced ejaculation failure, versus only 1% in the control group.

Here’s another anti-depressant, Abilify (source data):

And now of course we have other drugs to counteract some of these side-effects, so why not trying to counteract the negative Zoloft effects by popping a Viagra? Here are the new side effects you get to enjoy (source data):

And once that Viagra’s worn off you might be looking for a cigarette. But try Nicotrol (details) instead, you’ll get to take your chances with the following side effects:

Now at a glance you can see what you need to worry about and what you don’t. I imagine these beautiful labels on the side of the prescription boxes 🙂 Well, at least I can dream.

Uncategorized

RIP Aaron Swartz

I’m in academic publishing. My grandparents founded a publishing company. My father ran it for a decade. I sit on the board of directors. You could say academic publishing is in my blood.

Today I am nauseous. Aaron Swartz is dead. I don’t know whether or not he would be alive today if he wasn’t prosecuted so aggressively for “stealing” academic journal articles. But what I do know is that this is a dark day in our history. It is a stain on the entire academic publishing industry.

I fiercely believe that as academic publishers we make the world a better place. We do good. I also believe there is a place for publishers in the Internet age. We’re working hard to figure out how to navigate these times. But everyone involved in this industry should be ashamed today.

We lost a genius. We lost a rebel.

I’m proud to be in publishing. But today I am nauseous. Today I am deeply sad. Today I am ashamed.

Flex/Flash/Actionscript

My Apache Flex Logo Contest Submissions

Now that Flex is being moved to the Apache Software Foundation, it’s time for a new logo. A logo contest is currently underway (ends today I think). Here are my two submissions. Each one has more detailed variations and explanations of the thought process if you view the full submission.

Logo 1

Main Themes: Cross-platform, progress, advancing forward, new beginnings

This logo is meant to combine the symbols of an arrow and an X. The arrow means “moving forward”, which has a number of connotations (moving forward with Apache, a fresh start for the project, advancing the state of the art in web/desktop/mobile development). The X means “cross platform”, which should be pretty self-explanatory to anyone who uses Flex. The combination of the two symbols means “Advancing cross-platform development.”

See the full treatment with explanation.

Logo 2

Main Themes: Stability, strength, enterprise

The second logo tries to capture the enterprise story. Flex is the foundation of many enterprise applications. It provides a core set of components and tools, on top of which we build stable, powerful, robust applications that drive real businesses. This logo has Flex as a strong base. Built on top we have a symbolic chart, but this symbol is also meant to represent a skyline of skyscrapers. Our apps power large enterprises and drive business. Flex is the foundation of enterprise development.

See the full treatment with explanation.

Data Visualization

What would you call this chart?

I’m working on a visualization of people logging into SpatialKey, and I’ve come up with the following table/chart.

(larger version)

Each row represents one customer, and each cell is one day. If the cell is blue that means the customer logged in that day. Otherwise the cell is gray (lighter gray for weekends to give some context to the timeline). So cells are either on or off, I’m not trying to show how much usage there was on a particular day, just that there was some usage.

The only other thing beyond the on/off blue/gray state is I’m trying to highlight customers who have stopped logging in. So if a customer has previously been logging in regularly and then they stop for a long time, I highlight the row in red to show how long it’s been since we last saw the customer.

What kind of chart is this? There’s got to be a name, but I don’t know even know what to google to figure it out. I’ve created these before in Excel using conditional formatting of cell background color. This one is being created with d3.js using SVG (more posts on d3 will likely be coming).

The first thing I was reminded of was DNA sequencing using gel electrophoresis.

It also reminded me of the great Lite Brite that I used to play with as a kid.

So I’m currently going back and forth between “DNA Sequence chart” and “Lite Brite chart”, but there’s got to be a better term…

Art, Data Visualization, Maps

Printing Hurricanes as Gifts

We had a very busy month of August at SpatialKey as Hurricane Irene tore through the east coast. Our insurance customers were constantly watching Irene as it built up and approached land, then as it swept through parts of North Carolina, Vermont, New York, etc, and then as it died out quietly. We were writing software to visualize hurricane forecasts in real-time, as the storm was approaching, and getting immediate feedback from our customers. It was all a bit stressful, but exhilarating.

I wanted to have some kind of gift of thanks to give to our most helpful customers, who worked closely with us, helping us develop our hurricane product. To be honest, it felt like we all weathered a storm together during that hectic week in August. We brainstormed on sending out shirts, or bags, or some other standard corporate gear, but none of it really felt like “us”. So I came up with a more unique gift that I think captures our culture.

This is a 3D model of Hurricane Irene. The height of the model represents the wind speed at that location. You can see there are 3 bands of different wind speeds. The outer band represents where wind speeds hit 39 mph, the next band represents 58 mph, and the third band represents 74 mph (hurricane force winds). Then running through the middle we have the path of the eye of the storm, and the height of that track represents the exact speed at that point in time (Irene got up to 120 mph).

I created the model by taking the GIS data straight from NOAA and using that to build up the 3D model by hand. Then I sent the 3D model off to Shapeways for printing. The printed version you see in the photos is made out of alumide, which is sort of a composite aluminum material.

For our customers who were working with us while Irene passed through, we hope this will be a nice reminder of the work we did. It’s just a little paperweight to sit on your desk, but for those who were watching Irene as it developed and keeping a very close eye on the footprint of the storm, I think it’s a nice memento.

A hurricane can be a difficult concept to understand. For those affected in its path, it’s an incredibly tangible, visceral thing. But for those watching from afar (like me, sitting in California), it’s less “real”. We hear the overly-dramatic news reports and the doom-and-gloom predictions, but it’s a purely theoretical experience. Having a little paperweight of the storm on my desk doesn’t really help me understand the true impact Irene had on all those folks along the east coast, but at least I can touch it.

Data Visualization

Visualizing Time with the Infinity Hour Chart

This is another experiment in visualizing 24-hour cyclical data. My last post explored a method of linear representation (the Double Time Bar Chart). Linear representations have problems when it comes to showing the cyclical nature of time data (ie there is no start or end of a 24 hour cycle).

Inspiration

When trying to think of visual representations of never-ending cycles I was inspired by the infinity symbol. It’s a great symbol to show a continuous cycle, while at the same time being more visually interesting than a simple circle (fun fact: the infinity symbol dates back to 1655). The other iconography that came to mind when thinking about infinity is the hour glass. An hour glass not only represents time, but it also looks similar to a vertical infinity symbol.

My thought was that maybe I could combine the two to create a vertical infinity symbol that evokes the metaphor of an hour glass.

Back of the napkin

My original sketch of this concept was done on the back of a napkin. This is the first sketch, which shows how I was originally working with a horizontal infinity symbol.

I experimented with a few different options for how to show the data using fills. One of the sketches (if turned vertically) looks like an hour glass filling up with water on the bottom, reminiscent of the Wikileaks logo.

Drawing Infinity

The mathematical name for the infinity symbol is lemniscate, and more specifically the lemniscate of Bernoulli. With some good Googling you can find algorithms to draw the lemniscate of Bernoulli, which is what I did.

To start I divided the lemniscate into 24 segments, one for each of the hours of the day. My initial plot of the lemniscate in 24 parts looked like this:

I mapped the hours of the day onto this form, with 12pm noon at the very top and the infinity symbol crossing itself at 6pm/6am.

You follow the time by working your way around the infinity. If you start at the top of the symbol at noon, you would start moving around clockwise to 1pm, then 2pm, etc. You’ll reach the center at 6pm, at which point the symbol crosses itself and you then read it counter-clockwise around the bottom.

What you end up with is a way of dividing up the times of day into quadrants. The top-left quarter of the image is the morning, from 6am-12pm. Then the top-right is the afternoon, from 12pm-6pm. Then you have the evening in the botom left (6pm-midnight) and then late-night is in the bottom-right (12am-6am). These quarters match well with how I mentally categorize times of day.

Because the form crosses over itself you can actually read the chart almost in a left-to-right way for both the day (top) and night (bottom).

Drawing Data

The next step is to try to use this form to represent real data. Here’s an example that shows the distribution of driving under the influence arrests in San Francisco:

We can see that this particular crime is primarily a night-time activity that surges around midnight and starts falling off after about 2am. I’ve colored the range of 6am-6pm in orange to show day-time and the range of 6pm-6am in blue to indicate night-time.

For comparison here’s apartment burglary, which is mostly a day-time activity:

Once the viewer understands how to read the chart we can remove the labels and simply show the pattern. Here’s a comparison of a few night-time crimes:

Here’s a comparison of different types of burglary, some of which occur mostly during the daytime (residential burglary) and some of which occur in the afternoon and late at night (burglary of a store).

Small Multiples

Here’s a final example of many different crime types represented side by side to try to see how this chart works for comparisons.

Discussion

I’m not very happy with this chart in terms of the viewer’s ability to accurately read the chart. I also don’t think it highlights changes between hours enough. Often there are changes and trends that are easy to spot in the linear charts of my last post, but that are very difficult to see in these charts. Each hour is at quite a different angle than the hours on either side, which makes it difficult to compare two hours. You still get the big picture trends, like if a crime is a night-time or day-time crime, but the smaller trends are much harder to spot.

On the flip side, I really like the metaphors of the infinity symbol and the hourglass. On an artistic and philosophical level I think those metaphors make this a really beautiful visualization. Too bad it’s not also effective 🙂

Data Visualization

Visualizing Time with the Double-Time Bar Chart

In my last post I described some of the issues with visualizing cyclical data by hour of day and covered a few examples of different visualization methods that are typically used. This post is more a visualization experiment.

The Context

To start with a little context, for my day job I create a software product called SpatialKey, which is a business intelligence/data visualization tool. We can visualize all sorts of data all sorts of ways, but one of the things we do is show you a histogram of the occurrence of your data by the hour of day. The chart looks something like this:

The Problem

That’s about as simple as you can get, with a single series of data displayed as a bar chart. The section on line charts in my previous post covered some of the problems with these visualizations. I have two issues with this chart:

the break in the data between 11pm and midnight
the difficulty understanding the context of the time

To summarize, the first problem has to do with being able to understand the trends that occur around midnight (where this chart breaks the data). In this example we can see that data in the evening peaks at 9pm and then declines, but it’s difficult to accurately assess that declining pattern because you have to try to follow the data as it ends on the right edge of the chart and then continues all the way over on the left edge. This is only problematic when something interesting is happening around midnight (or whenever you choose to have your chart begin/end).

The second point about context has to do with the fact that I don’t think about my days as starting at midnight and ending at 11:59pm. A more accurate representation of how I think of my days is that they start sometime when I wake up, usually around 7am, and they are broken up into “day-time” and “night-time”, and they end more or less when I go to sleep. Within “day-time” my day is broken up into other categorizations, like “working hours”, “afternoon”, “lunch-time”, etc. And depending on the data in question, these contextual relationships might be incredibly important. For this post I’ll be looking at crime data. When you’re investigating crime data, the contextual relationship to the time of day can be incredibly relevant. I don’t just want to know about when people are assaulted, I want to know the rate of assaults on the street when I’m going to be walking on the street (typically right after work on my way home, or later at night going out to dinner, bars, etc).

The simple bar chart doesn’t solve these problems well. It presents a hard break in the data, forcing the viewer to mentally connect the end of the chart with the beginning. And it also forces the viewer to think about the days in the context of midnight – 11pm, which is not the natural categorization system we have for the hours of the day.

The Double-Time Bar Chart

My first attempt to address some of these problems is something I’m tentatively calling the Double-Time Bar Chart. The goal is to put the time in context a bit more for the viewer, and to always show a relevant, continuous visualization of all times of the day.

The chart still uses simple bars in a linear chart. But the data is actually shown twice in the chart. The top part of the chart is the exact same histogram chart with 24 bars that we had before, going from midnight to 11pm. The bottom part is the same data (upside down), but it starts instead at noon and goes to 11am. It’s shifted by 12 hours compared to the top chart. Imagine taking the top chart, flipping it upside down, then shifting it over to the right by 12 bars.

There’s a single x-axis for both the top and bottom charts, which is labelled with the hours of the day. But the hours are either AM for the top chart or PM for the bottom chart.

The highlighted regions represent 6am-5pm on the top and 6pm-5am on the bottom. That means there are 24 highlighted bars, so the highlighted bars represent one unduplicated set of 24 hours of data. The highlight is used to draw attention to day-time and night-time activities. A very rough color categorization is used to color 6am-5pm in a lighter yellow, representing day-time, and 6pm-5am in a darker color, representing night. I realize this doesn’t match up with actual sunlight/darkness times in most cities, but I think the 6am-6pm time range is close enough to how many people think about “day” vs “night” that it works.

The duplicated (but shifted) data in the top and bottom allows me to see a continuous, unbroken series of data that can show day-time activity (top) or night-time activity (bottom). There is no hour of the day that forces me to read the chart to the end and then continue on by moving my attention back to the beginning. If I’m interested in the trends during the day (say around lunchtime, so 11am-1pm) then I can read the top chart. But if I’m interested in night-time activity (say 11pm-1am) then I can read the bottom chart. In both cases I get a continuous chart that shows the full context of all the data around the range in which I’m interested.

The highlighted regions serve to draw attention to daytime versus nighttime, but we still keep the rest of the 24 hours visible in each chart (the unhighlighted bars) so you can always get the full context of the data. This allows you to follow the data from 4pm-8pm without forcing your eyes to jump from the top to the bottom.

Examples

For these examples I’ll be visualizing crime data from the city of San Francisco. I’m using two full years of crime, 2009 and 2010. You can download the crime data yourself if you want to play with it.

One note about these charts: there are no y-axis labels and each chart is relative to itself. I was interested in exploring the problem of visualizing the hourly patterns, not necessarily being able to know exactly how many crimes occurred at a certain hour. The highest bar in each chart does not always mean the same value. It simply means that’s the hour with the most crimes for that particular crime type.

Here’s an example of a crime the has an interesting day-time pattern, burglary. Notice the nice peak right when everyone leaves their homes unguarded as they go off to work.

And here’s a contrasting example of a crime that’s primarily a night-time activity, public intoxication.

Notice the nice nearly-linear build up all the way from about 9am up to the peak at midnight, then the dropoff after 2am (when the bars close in San Francisco).

There are a few crimes that are even more polarized. Arrests for driving under the influence have a nice distribution curve that peaks at midnight.

And prostitution is also primarily a night-time activity in San Francisco. There are two peaks, one just after work around 6-7pm, and then another a bit later in the evening at 11pm.

Small Multiples for Comparison

One way to compare different kinds of data is to use small multiples, which relies on small charts all laid out together to make it easy for your eyes to scan. These Double-Time charts work well in small multiples because you can quickly scan to see the difference between predominantly daytime crimes (large yellow areas in the top half) versus night-time crimes (blue areas in the bottom half). For instance, to get a better view of burglary, we can look at the sub-categorizations.

We can see that residential burglaries occur in the morning when people leave for work, whereas burglaries of a store are either late-afternoon or evening crimes.

The same approach can be used to compare many different types of crimes:

Or we can remove the x-axis and strip down the extra whitespace in the charts to get an even more compact view:

Summary/Revisiting the Goals

Now to circle back around to what I was trying to accomplish with this type of chart. There were two main goals: preserving the continuity of the data and putting the data into the context of your day.

To preserve continuity I’ve duplicated the data, which allows for a nice continuous linear chart that covers any important time range. If you’re interested in day-time trends you can look at the top chart. If you’re interested in night-time trends you can look at the bottom chart. But in either case you get a full, continuous range to put the trend in context.

To further put the data in context I’ve added some simple coloring to highlight the day-time vs night-time ranges. The x-axis labels (showing 6am, noon, 6pm, etc) give you some further context that helps you categorize the data. If you split the chart in quadrants you get rough categories for morning (top-left), afternoon (top-right), evening (bottom-left) and night (bottom-right).

The Big Caveat

This is just a simple design experiment. I’m making no claims about the efficacy of this chart. I have not run any studies to validate that this chart is clear to viewers or is any better (or worse) than any other visualization. I don’t even know if I myself think this is an effective chart. I’m just trying to spur a bit of a discussion and some experimentation around the problem of visualizing cyclical hourly data. So what do you think?

I’ll be posting another (even more experimental) take on this same topic shortly.

Data Visualization

Visualizing Cyclical Time – Hour of Day Charts

I’ve been mulling over a problem in my head for the past few days. How do you represent the hourly trends in data? More specifically, I’m talking about taking something like a whole year of crime records and showing the hourly trends of each type of crime, so you can see which crimes happen mostly in the mornings and which ones happen when the bars close. I haven’t yet found a solution I’m happy with.

A Multi-Part Mini-Series All About Time

This is the first part of a few blog posts on this topic. Apologies ahead of time if you don’t find the topic of visualizing the 24 hours of the day as fascinating as I do, but I’m going to take the time to fully geek out and focus in on this very specific problem in depth.

This is Part 1: Explaining the Challenge and Reviewing the Status Quo. This is sort of like a lit review; it’s my attempt to consolidate everything I can find about how people are currently representing 24-hour cyclical data.

Challenge: Continuity

The 24 hours of the day are a continuous cycle. The “day” doesn’t end at any arbitrary time. A normal single day ends at 11:59pm, but that’s just an imaginary line. When you’re talking about a generalized 24 hours, 11:59pm leads straight into midnight, which continues on to 1am. And sometimes the most interesting trends are in those hours on either side of midnight.

Many line or bar charts that deal with the 24-hour cycle simply pick a point at which the chart starts and ends. Sometimes the charts go from 12am-12am, sometimes they use ranges like 4am-4am (which puts the break during a time when most people are sleeping). For specific data this is often acceptable, but in general I find it to be a big limitation.

Challenge: Personal Context

We all have our daily routines. I wake up around 7:15, start work at 8, walk outside to get lunch around 12:30. For most people there’s a period in the morning and early evening when you’re coming and going from your home on the way to work. These moments bracket out experiences with time. We think about the day split into these chunks. Did something happen during the work day (roughly 8-5), or was it in the evening (roughly 5-8), or was it at night (9-12)? I think it’s safe to say that most people have a common notion of what morning, afternoon, evening, and night mean (even if those definitions vary a bit person to person). And each of us makes a mental organization of those time periods based on our own habits. When visualizing hourly trends I want to give people something they can relate to in the context of their daily schedules.

These issues aren’t easy to solve, and I’m reminded of a fantastic quote from the movie Closer:

Time, what a tricky little fucker.

Tricky, indeed.

Of course when it comes to visualizing the 24-hour cycle there are many more challenges than the few I’ve listed, but most of the other challenges fall into the general bucket of data visualization (as opposed to being very specific to the 24-hour cycle). Things like being able to accurately compare values against each other or being able to understand the chart without getting lost in confusion are things that are critical in all forms of data visualization.

Line Charts

One of the easiest methods of displaying data by hour of day is in a simple line series or bar chart. Typically these charts begin at a certain hour (often midnight) and show 24 unique bars or data points, ending at the same time they started. An example of this type of line chart is the following figure, taken from The 24 Hour Society.

This chart shows the percentage of people shopping at any given time of the day:

That chart goes from midnight to 11pm, which works great for showing the trend for an activity like shopping (which occurs during the day). But the choice of the x-axis range isn’t as nice when looking at an activity where the important trend period includes midnight.

Here’s another chart from the same publication that shows when people are sleeping:

In that case the interesting time period that we want to focus on is around 9pm – 7am, when people are going to sleep and waking up. But with an x-axis that starts at midnight we end up breaking the data directly in the middle of the interesting period.

Another example from the New York Times analyzes similar data about the typical activities that people perform throughout the day.

The NYT chart starts at 4am instead of midnight, which does a bit of a better job showing the trend of when people go to sleep than an x-axis that starts at midnight. The 4am-4am axis does a good job at showing activities during the day (which is what it was designed to do). Luckily there aren’t many activities that people do that cross over that 4am break (other than sleeping). But what if there was an interesting trend we wanted to highlight that did span 4am?

The biggest problem I have with these charts is continuity. As a general visualization tool, how can you pick an arbitrary time to break the data? How are you sure the most interesting part of the data doesn’t overlap when the chart begins and ends?

Circular Charts

The problem of continuity that line charts have can often be overcome by using some form of circular chart. The cyclical nature of the 24-hour day lends itself well to a circular representation. There are a few different methods typically used to visualize the 24 hour cycle. Some charts use a 12 hour circle, which mimics the display of an analog clock. Others display a full 24 hour circle.

12 Hour Clocks

It’s often tempting to use analogies to the real world, especially when visualizing time. Everyone is used to reading the hands of an analog clock. At a glance we all know what each of the 12 numbers on the clock face mean, and we already have built in associations with the spatial layout. The big problem with using the metaphor of a clock is that a clock face is only broken up into 12 hours, which means that you can only show half your data at a time (or you need to somehow layer two series on top of each other).

2 Clocks

One attempt at solving the problem of the 12 hour clock is to use two of them, since with two clocks you now have enough space to show all 24 hours. Here’s an example by Purna Duggirala that is essentailly a bubble chart that uses two clocks side by side.

The biggest problem with the chart is the incorrect continuity. A single clock on its own isn’t a continuous range, it’s really only half a range. So the clock on the left is showing 12am – 12pm, but when you reach the end of the circle the data doesn’t continue on like the representation shows. Instead you need to jump over to the second clock and continue on around. It’s difficult to see the ranges right around both 12pm and 12am, since you lose context in one direction or another (and worse, you get the incorrect context from the bordering bubbles).

Polar Spiral Clock

In a great show of Internet collaboration, the double clock chart spurred some other experimentation. Jorge Camos came up with a polar chart that plots the data on a single “clock face,” showing two 12 hour charts overlaid on top of each other.

This then led to another iteration that makes the continuation of the data series clearer. Jon Peltier modified the polar chart to show a line series connecting the hours.

Without experimenting more with this kind of polar chart, I can’t make up my mind about whether it’s effective or not. But I do really like the ingenuity.

24 Hour Circles

Given that the 12-hour clock is a difficult metaphor to use for a visualization, many people choose to use a 24-hour circle. 24-hour circular charts typically start with midnight at the top of the chart and then proceed clockwise, showing all 24 hours in one 360-degree range. The benefit of a 24-hour circular chart is that the cyclical nature of the data is represented and the viewer can easily read the continuity at any point on the chart.

A simple example of a 24-hour circle comes from Stamen Design‘s Crimespotting. This isn’t a data-heavy visualization chart, since it doesn’t actually show any data other than sunrise and sunset times (instead it’s main purpose is as a filtering control). But it’s a good example of the general layout of 24-hour charts, and it’s very clean and well-labeled. You can read about the thinking that went into designing this “time of pie” on Stamen’s blog.

The inspiration for this time selector, which is documented in Tom Carden’s blog post, was the real-life timer control used for automating lights in your house.

If you’ve decided to use a 24-hour circular chart then you’ll need a way to visualize your data. The main methods I’ve found for visualizing data around a circle include sized wedges, colored arcs/wedges, sized bubbles positioned around the chart, or sized spokes. I’ll cover just a few of these (sized wedges and colored arcs).

Wedges

Perhaps the most well-known radial chart that uses differently sized wedges is Florence Nightingale’s coxcomb chart from 1857 that shows the causes of death during the Crimean War.

Nightingale’s chart shows 12 months of data, each wedge corresponds with a single month. The same technique can easily be applied to hourly data as well. Instead of 12 wedges an hourly chart would have 24, but the same general principle applies. Wedges always use the same angles (as opposed to typical pie charts) and modify the radius to size the wedge based on the data.

If you’re thinking about using this type of chart I’d highly recommend reading over this critique that highlights some common problems with the approach.

An alternative take on using wedge size around a circular chart can be found in Antonio Gabaglio’s Storia e Teoria Generale Della Statistica, which was written in 1880. Gabaglio created a few charts that also plotted data by month in a circular fashion. He used a few different wedge orientations:

I haven’t seen this method used to represent data by hour of day, but the same technique could easily apply.

Colored Arcs/Wedges

Another way to represent data around a circle is to use bands of color. An interesting article, Activities, ringmaps and geovisualization of large human movement fields, by Jinfeng Zhao, Pip Forer, and Andrew Harvey explains a data viz approach they call a “Ringmap.” The article itself isn’t freely available, but you can access this condensed version to get the gist, or take a look at this presentation for more good illustrations of the technique.

Here are a few more complex examples of using ringmaps:

Ringmaps can also represent multiple series of data. Comparing multiple series was actually the main intention of the authors. With multiple series the data forms multiple rings, one around the other. That allows for comparison of the same cycle periods between data series. Comparing multiple data series is a bit outside the scope of this article (and god knows this is getting long enough), so I won’t go into detail.

Spirals

Spirals are similar to circular charts, but are used with slightly different data. They don’t really apply directly to the task I’m concerned with in this article, but they’re interesting in their own right and deserve a mention. Like circle charts, spirals are used to represent cyclical data, such as data that occurs over a 24-hour period. But spirals typically show many iterations of the data (ie many series) in a spiral layout. So instead of showing aggregate numbers (ie a single set of numbers showing the total number of occurrences in each hour), a spiral chart might show many individual days worth of data, all in a connected sequence.

Here’s a spiral chart of data tracking sunshine intensity, from the paper Visualizing Time-Series on Spirals by Marc Alexa. By visualizing many individual days worth of data at the same time you can start to identify patterns and find abnormalities.

Here’s one last spiral example, which was published by a fellow Flex developer, Michael VanDaniker. In his paper, Leveraging the Spiral Graph for Transportation System Data Visualization, VanDaniker provides the following example of using a spiral representation to show the pattern of traffic collisions on the different days of the week (and the hours of the day during those days).

Spirals are a bit outside the scope of the problem I’m trying to solve, but since they are used for identifying patterns in periodic data they fall within the same ballpark.

Continuous Circles vs Terminal Lines

One final point I want to make before closing has to do with weighing the benefits of circular visualizations (continuous) over line or bar charts (terminal, since they have a beginning and end). The continuity offered by circular charts makes them seem like a good choice, but it might be worth taking a critical stance when it comes to choosing circular charts versus line or bar charts.

I haven’t been able to find many studies that have examined how effective circular data visualization techniques are over linear ones, but I did come across a dissertation from 2008 by a Stanford Psychology student, Angela Kessell. Kessell’s dissertation, titled Cognitive Methods for Information Visualization: Linear and Cyclical Events, examines how people choose to represent cycles, and finds that in many cases the majority of people draw cyclical data (things like the water cycle, the four seasons, etc) in a linear fashion and not as circles as we might expect.

When we choose to represent cyclical data it might seem like using a circular representation is the most intuitive and obvious choice, but Kessell’s research makes you question whether people’s brains intuitively think of cyclical data in a circular fashion or if they instead think of cyclical data in a linear way. Without jumping to any hard conclusions, I just want to point out the idea that maybe our brains more effectively conceptualize linear representations.

Whew

So that’s my attempt at an exhaustive run-down on the current state of the industry when it comes to visualizing cyclical 24-hour data. If you made it this far, I salute you.

The next parts in this mini-series about time charts will be some experiments that I’ve been working on to try some new ideas for visualizing time data. As a teaser for the upcoming posts, here are some images that I’ll be explaining soon in subsequent posts.

These are coming soon:

Data Visualization, SpatialKey

Ethics and the use of DUI data

I do a lot of work with San Francisco crime data, and one of the things that I’ve been struggling with is one particular dataset: the locations of all the driving under the influence (DUI) arrests in the city. Just yesterday there was an article about US Senators asking Apple to remove DUI checkpoint applications from the app store.

San Francisco publishes a huge amount of crime data, going all the way back to 2003. You can grab a single CSV file with all the data. Over a million crimes. It’s beautiful.

If you look at just the DUI records you start seeing patterns. Here’s about a thousand DUIs over the past 2 years (2009-2010). Click any of these images for larger versions of the maps.

If we look at a density map individual streets start lighting up. Specific intersections stand out.

Here’s a representation that assigns the number of DUIs to the street segment they occurred on and colors the data like a typical traffic map.

And finally just for fun, here’s a 3D rendering of the same 2 years of data:

It’s compelling data, and fairly easy to tell an interesting story. But is there an ethical issue around visualizing or using this data? There’s a lot that you can do with the data, obviously visualizations like this are just scratching the surface.

An idea that crosses the line

Following one train of thought to its logical conclusion leads me to a mobile app idea. It’s a simple app, essentially just a routing application. You type in where you’re going and you can get directions from your current location, just like any other mapping or GPS routing application. Except we can give you directions that avoid known DUI hotspots. In a very simplified sense, routing algorithms basically give streets a score, usually determined based on factors like speed limit, road size, distance, etc. The path with the lowest score wins, and that’s what you end up getting for your directions. All you’d have to do to route around common DUI locations is make the number of historical DUIs along a street segment count in the routing algorithm’s calculation. Streets with lots of historical DUIs would be avoided in favor of side streets with fewer arrests. You’d avoid Geary Blvd and intersections like 16th St and Mission St.

It’s an easy app and the data is there for the taking. I’ll leave aside the question of whether the idea would work in terms of being effective at making drunk drivers avoid actual arrest. For argument’s sake, let’s assume that it would work, or that some other similar type of app could. It’s not an app I’d build, and I assume pretty much everyone understands the moral objection.

I don’t have any big moral takeaway or conclusion. On the one hand there are arguments that data and knowledge can never inherently be bad. Then there are arguments that this particular data (or at least specifically a DUI-avoiding directions app) would only be used to encourage drunk driving. I’m not going to make the DUI-avoiding mobile app, that goes way too far down the path of encouraging bad behavior. But it brings up a lot of interesting questions we need to think about as we’re working with data like this.