In my last post I described some of the issues with visualizing cyclical data by hour of day and covered a few examples of different visualization methods that are typically used. This post is more a visualization experiment.
To start with a little context, for my day job I create a software product called SpatialKey, which is a business intelligence/data visualization tool. We can visualize all sorts of data all sorts of ways, but one of the things we do is show you a histogram of the occurrence of your data by the hour of day. The chart looks something like this:
That’s about as simple as you can get, with a single series of data displayed as a bar chart. The section on line charts in my previous post covered some of the problems with these visualizations. I have two issues with this chart:
- the break in the data between 11pm and midnight
- the difficulty understanding the context of the time
To summarize, the first problem has to do with being able to understand the trends that occur around midnight (where this chart breaks the data). In this example we can see that data in the evening peaks at 9pm and then declines, but it’s difficult to accurately assess that declining pattern because you have to try to follow the data as it ends on the right edge of the chart and then continues all the way over on the left edge. This is only problematic when something interesting is happening around midnight (or whenever you choose to have your chart begin/end).
The second point about context has to do with the fact that I don’t think about my days as starting at midnight and ending at 11:59pm. A more accurate representation of how I think of my days is that they start sometime when I wake up, usually around 7am, and they are broken up into “day-time” and “night-time”, and they end more or less when I go to sleep. Within “day-time” my day is broken up into other categorizations, like “working hours”, “afternoon”, “lunch-time”, etc. And depending on the data in question, these contextual relationships might be incredibly important. For this post I’ll be looking at crime data. When you’re investigating crime data, the contextual relationship to the time of day can be incredibly relevant. I don’t just want to know about when people are assaulted, I want to know the rate of assaults on the street when I’m going to be walking on the street (typically right after work on my way home, or later at night going out to dinner, bars, etc).
The simple bar chart doesn’t solve these problems well. It presents a hard break in the data, forcing the viewer to mentally connect the end of the chart with the beginning. And it also forces the viewer to think about the days in the context of midnight – 11pm, which is not the natural categorization system we have for the hours of the day.
The Double-Time Bar Chart
My first attempt to address some of these problems is something I’m tentatively calling the Double-Time Bar Chart. The goal is to put the time in context a bit more for the viewer, and to always show a relevant, continuous visualization of all times of the day.
The chart still uses simple bars in a linear chart. But the data is actually shown twice in the chart. The top part of the chart is the exact same histogram chart with 24 bars that we had before, going from midnight to 11pm. The bottom part is the same data (upside down), but it starts instead at noon and goes to 11am. It’s shifted by 12 hours compared to the top chart. Imagine taking the top chart, flipping it upside down, then shifting it over to the right by 12 bars.
There’s a single x-axis for both the top and bottom charts, which is labelled with the hours of the day. But the hours are either AM for the top chart or PM for the bottom chart.
The highlighted regions represent 6am-5pm on the top and 6pm-5am on the bottom. That means there are 24 highlighted bars, so the highlighted bars represent one unduplicated set of 24 hours of data. The highlight is used to draw attention to day-time and night-time activities. A very rough color categorization is used to color 6am-5pm in a lighter yellow, representing day-time, and 6pm-5am in a darker color, representing night. I realize this doesn’t match up with actual sunlight/darkness times in most cities, but I think the 6am-6pm time range is close enough to how many people think about “day” vs “night” that it works.
The duplicated (but shifted) data in the top and bottom allows me to see a continuous, unbroken series of data that can show day-time activity (top) or night-time activity (bottom). There is no hour of the day that forces me to read the chart to the end and then continue on by moving my attention back to the beginning. If I’m interested in the trends during the day (say around lunchtime, so 11am-1pm) then I can read the top chart. But if I’m interested in night-time activity (say 11pm-1am) then I can read the bottom chart. In both cases I get a continuous chart that shows the full context of all the data around the range in which I’m interested.
The highlighted regions serve to draw attention to daytime versus nighttime, but we still keep the rest of the 24 hours visible in each chart (the unhighlighted bars) so you can always get the full context of the data. This allows you to follow the data from 4pm-8pm without forcing your eyes to jump from the top to the bottom.
For these examples I’ll be visualizing crime data from the city of San Francisco. I’m using two full years of crime, 2009 and 2010. You can download the crime data yourself if you want to play with it.
One note about these charts: there are no y-axis labels and each chart is relative to itself. I was interested in exploring the problem of visualizing the hourly patterns, not necessarily being able to know exactly how many crimes occurred at a certain hour. The highest bar in each chart does not always mean the same value. It simply means that’s the hour with the most crimes for that particular crime type.
Here’s an example of a crime the has an interesting day-time pattern, burglary. Notice the nice peak right when everyone leaves their homes unguarded as they go off to work.
And here’s a contrasting example of a crime that’s primarily a night-time activity, public intoxication.
Notice the nice nearly-linear build up all the way from about 9am up to the peak at midnight, then the dropoff after 2am (when the bars close in San Francisco).
There are a few crimes that are even more polarized. Arrests for driving under the influence have a nice distribution curve that peaks at midnight.
And prostitution is also primarily a night-time activity in San Francisco. There are two peaks, one just after work around 6-7pm, and then another a bit later in the evening at 11pm.
Small Multiples for Comparison
One way to compare different kinds of data is to use small multiples, which relies on small charts all laid out together to make it easy for your eyes to scan. These Double-Time charts work well in small multiples because you can quickly scan to see the difference between predominantly daytime crimes (large yellow areas in the top half) versus night-time crimes (blue areas in the bottom half). For instance, to get a better view of burglary, we can look at the sub-categorizations.
We can see that residential burglaries occur in the morning when people leave for work, whereas burglaries of a store are either late-afternoon or evening crimes.
The same approach can be used to compare many different types of crimes:
Or we can remove the x-axis and strip down the extra whitespace in the charts to get an even more compact view:
Summary/Revisiting the Goals
Now to circle back around to what I was trying to accomplish with this type of chart. There were two main goals: preserving the continuity of the data and putting the data into the context of your day.
To preserve continuity I’ve duplicated the data, which allows for a nice continuous linear chart that covers any important time range. If you’re interested in day-time trends you can look at the top chart. If you’re interested in night-time trends you can look at the bottom chart. But in either case you get a full, continuous range to put the trend in context.
To further put the data in context I’ve added some simple coloring to highlight the day-time vs night-time ranges. The x-axis labels (showing 6am, noon, 6pm, etc) give you some further context that helps you categorize the data. If you split the chart in quadrants you get rough categories for morning (top-left), afternoon (top-right), evening (bottom-left) and night (bottom-right).
The Big Caveat
This is just a simple design experiment. I’m making no claims about the efficacy of this chart. I have not run any studies to validate that this chart is clear to viewers or is any better (or worse) than any other visualization. I don’t even know if I myself think this is an effective chart. I’m just trying to spur a bit of a discussion and some experimentation around the problem of visualizing cyclical hourly data. So what do you think?
I’ll be posting another (even more experimental) take on this same topic shortly.