I’ve been mulling over a problem in my head for the past few days. How do you represent the hourly trends in data? More specifically, I’m talking about taking something like a whole year of crime records and showing the hourly trends of each type of crime, so you can see which crimes happen mostly in the mornings and which ones happen when the bars close. I haven’t yet found a solution I’m happy with.
A Multi-Part Mini-Series All About Time
This is the first part of a few blog posts on this topic. Apologies ahead of time if you don’t find the topic of visualizing the 24 hours of the day as fascinating as I do, but I’m going to take the time to fully geek out and focus in on this very specific problem in depth.
This is Part 1: Explaining the Challenge and Reviewing the Status Quo. This is sort of like a lit review; it’s my attempt to consolidate everything I can find about how people are currently representing 24-hour cyclical data.
The 24 hours of the day are a continuous cycle. The “day” doesn’t end at any arbitrary time. A normal single day ends at 11:59pm, but that’s just an imaginary line. When you’re talking about a generalized 24 hours, 11:59pm leads straight into midnight, which continues on to 1am. And sometimes the most interesting trends are in those hours on either side of midnight.
Many line or bar charts that deal with the 24-hour cycle simply pick a point at which the chart starts and ends. Sometimes the charts go from 12am-12am, sometimes they use ranges like 4am-4am (which puts the break during a time when most people are sleeping). For specific data this is often acceptable, but in general I find it to be a big limitation.
Challenge: Personal Context
We all have our daily routines. I wake up around 7:15, start work at 8, walk outside to get lunch around 12:30. For most people there’s a period in the morning and early evening when you’re coming and going from your home on the way to work. These moments bracket out experiences with time. We think about the day split into these chunks. Did something happen during the work day (roughly 8-5), or was it in the evening (roughly 5-8), or was it at night (9-12)? I think it’s safe to say that most people have a common notion of what morning, afternoon, evening, and night mean (even if those definitions vary a bit person to person). And each of us makes a mental organization of those time periods based on our own habits. When visualizing hourly trends I want to give people something they can relate to in the context of their daily schedules.
These issues aren’t easy to solve, and I’m reminded of a fantastic quote from the movie Closer:
Time, what a tricky little fucker.
Of course when it comes to visualizing the 24-hour cycle there are many more challenges than the few I’ve listed, but most of the other challenges fall into the general bucket of data visualization (as opposed to being very specific to the 24-hour cycle). Things like being able to accurately compare values against each other or being able to understand the chart without getting lost in confusion are things that are critical in all forms of data visualization.
One of the easiest methods of displaying data by hour of day is in a simple line series or bar chart. Typically these charts begin at a certain hour (often midnight) and show 24 unique bars or data points, ending at the same time they started. An example of this type of line chart is the following figure, taken from The 24 Hour Society.
This chart shows the percentage of people shopping at any given time of the day:
That chart goes from midnight to 11pm, which works great for showing the trend for an activity like shopping (which occurs during the day). But the choice of the x-axis range isn’t as nice when looking at an activity where the important trend period includes midnight.
Here’s another chart from the same publication that shows when people are sleeping:
In that case the interesting time period that we want to focus on is around 9pm – 7am, when people are going to sleep and waking up. But with an x-axis that starts at midnight we end up breaking the data directly in the middle of the interesting period.
Another example from the New York Times analyzes similar data about the typical activities that people perform throughout the day.
The NYT chart starts at 4am instead of midnight, which does a bit of a better job showing the trend of when people go to sleep than an x-axis that starts at midnight. The 4am-4am axis does a good job at showing activities during the day (which is what it was designed to do). Luckily there aren’t many activities that people do that cross over that 4am break (other than sleeping). But what if there was an interesting trend we wanted to highlight that did span 4am?
The biggest problem I have with these charts is continuity. As a general visualization tool, how can you pick an arbitrary time to break the data? How are you sure the most interesting part of the data doesn’t overlap when the chart begins and ends?
The problem of continuity that line charts have can often be overcome by using some form of circular chart. The cyclical nature of the 24-hour day lends itself well to a circular representation. There are a few different methods typically used to visualize the 24 hour cycle. Some charts use a 12 hour circle, which mimics the display of an analog clock. Others display a full 24 hour circle.
12 Hour Clocks
It’s often tempting to use analogies to the real world, especially when visualizing time. Everyone is used to reading the hands of an analog clock. At a glance we all know what each of the 12 numbers on the clock face mean, and we already have built in associations with the spatial layout. The big problem with using the metaphor of a clock is that a clock face is only broken up into 12 hours, which means that you can only show half your data at a time (or you need to somehow layer two series on top of each other).
One attempt at solving the problem of the 12 hour clock is to use two of them, since with two clocks you now have enough space to show all 24 hours. Here’s an example by Purna Duggirala that is essentailly a bubble chart that uses two clocks side by side.
The biggest problem with the chart is the incorrect continuity. A single clock on its own isn’t a continuous range, it’s really only half a range. So the clock on the left is showing 12am – 12pm, but when you reach the end of the circle the data doesn’t continue on like the representation shows. Instead you need to jump over to the second clock and continue on around. It’s difficult to see the ranges right around both 12pm and 12am, since you lose context in one direction or another (and worse, you get the incorrect context from the bordering bubbles).
Polar Spiral Clock
In a great show of Internet collaboration, the double clock chart spurred some other experimentation. Jorge Camos came up with a polar chart that plots the data on a single “clock face,” showing two 12 hour charts overlaid on top of each other.
This then led to another iteration that makes the continuation of the data series clearer. Jon Peltier modified the polar chart to show a line series connecting the hours.
Without experimenting more with this kind of polar chart, I can’t make up my mind about whether it’s effective or not. But I do really like the ingenuity.
24 Hour Circles
Given that the 12-hour clock is a difficult metaphor to use for a visualization, many people choose to use a 24-hour circle. 24-hour circular charts typically start with midnight at the top of the chart and then proceed clockwise, showing all 24 hours in one 360-degree range. The benefit of a 24-hour circular chart is that the cyclical nature of the data is represented and the viewer can easily read the continuity at any point on the chart.
A simple example of a 24-hour circle comes from Stamen Design‘s Crimespotting. This isn’t a data-heavy visualization chart, since it doesn’t actually show any data other than sunrise and sunset times (instead it’s main purpose is as a filtering control). But it’s a good example of the general layout of 24-hour charts, and it’s very clean and well-labeled. You can read about the thinking that went into designing this “time of pie” on Stamen’s blog.
The inspiration for this time selector, which is documented in Tom Carden’s blog post, was the real-life timer control used for automating lights in your house.
If you’ve decided to use a 24-hour circular chart then you’ll need a way to visualize your data. The main methods I’ve found for visualizing data around a circle include sized wedges, colored arcs/wedges, sized bubbles positioned around the chart, or sized spokes. I’ll cover just a few of these (sized wedges and colored arcs).
Perhaps the most well-known radial chart that uses differently sized wedges is Florence Nightingale’s coxcomb chart from 1857 that shows the causes of death during the Crimean War.
Nightingale’s chart shows 12 months of data, each wedge corresponds with a single month. The same technique can easily be applied to hourly data as well. Instead of 12 wedges an hourly chart would have 24, but the same general principle applies. Wedges always use the same angles (as opposed to typical pie charts) and modify the radius to size the wedge based on the data.
If you’re thinking about using this type of chart I’d highly recommend reading over this critique that highlights some common problems with the approach.
An alternative take on using wedge size around a circular chart can be found in Antonio Gabaglio’s Storia e Teoria Generale Della Statistica, which was written in 1880. Gabaglio created a few charts that also plotted data by month in a circular fashion. He used a few different wedge orientations:
I haven’t seen this method used to represent data by hour of day, but the same technique could easily apply.
Another way to represent data around a circle is to use bands of color. An interesting article, Activities, ringmaps and geovisualization of large human movement fields, by Jinfeng Zhao, Pip Forer, and Andrew Harvey explains a data viz approach they call a “Ringmap.” The article itself isn’t freely available, but you can access this condensed version to get the gist, or take a look at this presentation for more good illustrations of the technique.
Here are a few more complex examples of using ringmaps:
Ringmaps can also represent multiple series of data. Comparing multiple series was actually the main intention of the authors. With multiple series the data forms multiple rings, one around the other. That allows for comparison of the same cycle periods between data series. Comparing multiple data series is a bit outside the scope of this article (and god knows this is getting long enough), so I won’t go into detail.
Spirals are similar to circular charts, but are used with slightly different data. They don’t really apply directly to the task I’m concerned with in this article, but they’re interesting in their own right and deserve a mention. Like circle charts, spirals are used to represent cyclical data, such as data that occurs over a 24-hour period. But spirals typically show many iterations of the data (ie many series) in a spiral layout. So instead of showing aggregate numbers (ie a single set of numbers showing the total number of occurrences in each hour), a spiral chart might show many individual days worth of data, all in a connected sequence.
Here’s a spiral chart of data tracking sunshine intensity, from the paper Visualizing Time-Series on Spirals by Marc Alexa. By visualizing many individual days worth of data at the same time you can start to identify patterns and find abnormalities.
Here’s one last spiral example, which was published by a fellow Flex developer, Michael VanDaniker. In his paper, Leveraging the Spiral Graph for Transportation System Data Visualization, VanDaniker provides the following example of using a spiral representation to show the pattern of traffic collisions on the different days of the week (and the hours of the day during those days).
Spirals are a bit outside the scope of the problem I’m trying to solve, but since they are used for identifying patterns in periodic data they fall within the same ballpark.
Continuous Circles vs Terminal Lines
One final point I want to make before closing has to do with weighing the benefits of circular visualizations (continuous) over line or bar charts (terminal, since they have a beginning and end). The continuity offered by circular charts makes them seem like a good choice, but it might be worth taking a critical stance when it comes to choosing circular charts versus line or bar charts.
I haven’t been able to find many studies that have examined how effective circular data visualization techniques are over linear ones, but I did come across a dissertation from 2008 by a Stanford Psychology student, Angela Kessell. Kessell’s dissertation, titled Cognitive Methods for Information Visualization: Linear and Cyclical Events, examines how people choose to represent cycles, and finds that in many cases the majority of people draw cyclical data (things like the water cycle, the four seasons, etc) in a linear fashion and not as circles as we might expect.
When we choose to represent cyclical data it might seem like using a circular representation is the most intuitive and obvious choice, but Kessell’s research makes you question whether people’s brains intuitively think of cyclical data in a circular fashion or if they instead think of cyclical data in a linear way. Without jumping to any hard conclusions, I just want to point out the idea that maybe our brains more effectively conceptualize linear representations.
So that’s my attempt at an exhaustive run-down on the current state of the industry when it comes to visualizing cyclical 24-hour data. If you made it this far, I salute you.
The next parts in this mini-series about time charts will be some experiments that I’ve been working on to try some new ideas for visualizing time data. As a teaser for the upcoming posts, here are some images that I’ll be explaining soon in subsequent posts.
These are coming soon: