Archive for the 'Data Visualization' Category

I’ve been playing with different ways of representing data (see my previous night lights example) and I decided to venture into 3D representations. I’ve used a full year of crime data for San Francisco from 2009 to create these maps. The full dataset can be download from the city’s DataSF website.

A view from above
This view shows different types of crime in San Francisco viewed directly from above. The sun is shining from the east, as it would during sunrise.

top_500

I love how some of the features in these maps are pretty consistent across all the crime types, like the mountain ridge along Mission St., and how some of the features only crop up in one or two of the maps. The most unique map by far is the one for prostitution (more on that further down).

An alternate view
Here’s the same data but from a different angle, which helps show some of the differences.

UPDATE: Whoops, I screwed up originally and had a duplicate image. The original graphic showed the same map for Vandalism and Assault (both were the Vandalism map). This updated graphic has the correct map for Assault.

right_500_2

Many of the maps have peaks in the Tenderloin, which is that high area sort of in the north-east center area of the city. Some are extremely concentrated (narcotics) and some are far more spread out (vehicle theft).

My favorite map is the one for prostitution (maybe “favorite” is the wrong choice of words there). Nearly all the arrests for prostitution in San Francisco occur along what I’m calling the “Mission Mountain Ridge”, which runs up Mission St between 24th and 16th.

EDIT: I’ve been corrected. Upon closer inspection the prostitution arrests are peaking on Shotwell St. at the intersections of 19th and 17th. I’m sure the number of colorful euphemisms you can come up with that include the words “shot” and “well” are endless.

I love the way the mountain range casts a shadow over much of the city. There’s also a second peak in the Tenderloin (which I’m dubbing Mt. Loin).

prostitution_500

Drug crimes are also interesting to look at, since so much of the drug activity in San Francisco is centered in a few distinct areas. We can see Mt. Loin rising high above all the other small peaks. The second highest peak is the 16th St. BART peak.

drugs_500

There are other consistent features in these maps, in addition to Mt. Loin and the Mission Range. There’s a valley that separates the peaks in the Mission and the peaks in the Tenderloin, which is where the freeway runs (Valley 101). You’ll also notice a division in many of the maps that separates the southeast corner. That’s the Hunter’s Point Riverbed (aka the 280 freeway).

Disclaimer
These maps were generated from real data, but please don’t take them as being accurate. The data was aggregated geographically and artistically rendered. This is meant more as an art piece than an informative visualization.

I’m in love with the New York Times data visualization/infographics division. They consistently put out some of the most amazing visualization pieces (both in print and online) that I’ve ever seen. Their recently geographic analysis of Netflix ratings was absolutely superb. And we all probably saw their election maps (either for 2008 or 2004). They produce stunning displays that convey amazing amounts of information in a way that only interactive graphics can do. And they’re all done in Flash.

nyt_netflix
A Peek Into Netflix Queues


nyt_thanksgiving_recipes
What’s Cooking For Thanksgiving
nyt_swineflu
Swine Flu Cases Map


nyt_unemployment_explorer
The Jobless Rate for People Like You
nyt_vtech
Virginia Tech Shooting


nyt_parkingtickets
Map of Parking Tickets in New York City
nyt_how_people_spend_their_day
How Different Groups Spend Their Day


And for even more check out the NYT’s selected infographics list or simply do a Google search for “interactive graphic” on the New York Times website.

flashSo when you see images showing the missing plugin icon on the New York Times website on the iPad or iPhone, that’s not just some annoying ad that’s not playing or a streaming video. That’s some of the most cutting edge visualization work that’s being produced today. And without Flash it simply doesn’t exist.

Sure, you might be able to recreate some of these without using Flash (I’d argue that many you simply would never be able to do, but that’s for another debate). But the point isn’t whether or not you could eventually do it without Flash. The point is that the New York Times does them all with flash. So we need to ask why. It’s not an accident or an arbitrary technology choice. Newspapers operate on a schedule and a budget (and one that is getting tighter and tighter). The simple truth is, creating amazing visualizations like you see on the NYT website is possible and easy with Flash. They use the tools that get the job done most efficiently and produce the best end result. This isn’t an argument about whether it’s theoretically possible to create these types of visualizations without Flash, it’s about whether it’s being done. And save for handfuls of examples, it’s not (for every one good JavaScript visualization I’ll show you ten good Flash ones). Taking away the New York Times’ ability to use Flash is setting their data visualization department back 5 or 10 years. And it would mean that we, as readers and citizens, would be missing out on some of the most important journalism being produced today.

The New York Times (like all newspapers) is in crisis. They are trying to reinvent themselves in an online form. And as a news organization they are one of the most progressive and experimental out there. They are embracing the new medium by doing some of the best damn interactive graphic work I’ve ever seen. They make things that convey news and information in ways that draw people in and keep them coming back for more.

But without Flash they’re just a newspaper. And we all know newspapers are dying.

I’ve been reading William Playfair’s Commercial and Political Atlas, in which he invented the line chart. In the book, Playfair examines the imports and exports between Britain and various countries. To illustrate these trade relationships, Playfair created the first ever line charts that show the change in trade over time.

The Inspiration
Each section of the book covered a different country, and each one contained a chart that showed the imports and exports like this:

playfair_north_america_trade2
playfair_ireland
Two line series are shown, one for imports and one for exports, and shading is used to show when there was a “balance in favor of England” (when there were more exports than imports).

My Recreation
I’ve been captivated by these charts and wanted to recreate them, but with modern data. You can find tons of US trade data at the US Census Bureau’s website, including a spreadsheet that has all the data in one place. I downloaded that data and put together a little application to create Playfair-esque charts.

Click this screenshot to play with the app yourself:
playfair_app_screenshot
View source is enabled.

The app displays all the countries that the US has trade data for, month by month going back as far as 1985. Each country is displayed in the list on the left with a sparkline chart of the trade data. A red fill indicates we are importing from a given country more than we are exporting, and a light green fill indicates we are exporting more than we are importing.

Exploring the data
The charts tell some really interesting stories. Some of the charts show a nearly identical relationship of imports to exports, both growing at the same rates, like these charts of the UK and Guatemala.
united_kingdom
guatemala

While some other charts show different relationships. Notice how exports to Hong Kong have been steadily increasing, but imports from Hong Kong have been declining.
hong_kong

Or we can see what imposing sanctions on a country looks like, as illustrated by sanctions on Burma that were put into place in 2003:
burma

Or what a coup in Haiti looks like:
haiti

Or what a massive tsunami can do to a place like the Maldives:
maldives

We can see the massive growth of China (and notice how interestingly seasonal each year is, peaking in October):
china

And one final one that I find very interesting, isn’t a country, but the import and export of what is classified as “Advanced Technology Products“, which includes things like biotech and advanced electronics products. Notice how up until the early 2000s we were exporting more of these products than we were importing, but by 2002 that balance shifted and the gap continues to increase:
advance_tech_products

I had fun creating this app, but one thing I didn’t expect was how much fun researching the charts was going to be. The charts that stuck out with trends that were abnormal all had interesting stories to tell about the history of the country.

In closing, I’ll end with a quote from Playfair in which he describes the concept of displaying numeric values in a line chart (remember, he was the first person to actually do this):

As the eye is the best judge of proportion, being able to estimate it with more quickness and accuracy than any other of our organs, it follows, that wherever relative quantities are in question … this mode of representing it is peculiarly applicable; it gives a simple, accurate, and permanent idea, by giving form and shape to a number of separate ideas, which are otherwise abstract and unconnected.

Well said, Mr. Playfair, well said. Your charts are just as effective nearly 200 years later.

On the RIAdventure conference I gave a presentation about the past, present, and future of data visualization as I see it (fun side note: RIAdventure is the only conference I can say I “went on”). Luckily, the organizers filmed the entire thing, and we now have the video of the whole presentation that you can watch. This presentation covered a brief history of the field of data visualization, with the focus on the invention (in the not too distant past) of many data visualization techniques we take for granted. The point of the historical exercise was to point out that new opportunities with new data that we have before us present new opportunities for invention. I talked about new trends I see emerging in the data itself (massive datasets, city data, you life data, stream data) and what those trends mean for us as data visualization software engineers (I also argue that everyone will be a “data viz” engineer to some degree in the future).

I hope you enjoy the presentation, it was a lot of fun to create and to present. I learned a ton from the research and it was exciting thinking about the future of the field. Below is the full video (low resolution streaming from vimeo, or you can find higher resolution streaming form screencast here, or you can even download the full video file). Also embedded below are the slides that go along with the presentation, and you can always download the slides as a PDF.

Also check out some of the other presentations from RIAdventure.

Video:

Slides:

One of the first time-series line charts ever drawn was a visualization of the great American credit crisis (but probably not the credit crisis that comes immediately to mind). If you were to look at this chart today you might even mistake it for the charts of the housing credit crisis of the past few years.

playfair_north_america_trade2
(see image credits below for image details)

This chart was created by William Playfair and published in 1786 in The Commercial and Political Atlas. In that work Playfair literally invented the line chart. This particular chart shows the imports and exports between Britain and America between 1700-1800. The red line is the line for exports from Britain to America, and the lighter yellow line is the imports from America. You can see the relationship of imports to exports stays relatively constant for the first 50 years (1700-1750) and then the exports start shooting up dramatically, at a rate much greater than the increase in imports.

Compare that with this chart of housing prices, created by the New York Times.
nyt_chart_cropped
(image from the New York Times)

I guess we know what a credit-driven catastrophe looks like. And it’s not only the image itself that looks similar, at times his words sound as if he’s writing today about our current financial mess.

Between 1750-1772 there was a rapid increase in exports from Britain to America. These exports were the result of many new merchants hoping to strike it big by shipping goods to the new settlers. But the reason things got out of control has to do with credit. Merchants started lending and borrowing on credit to finance their get-rich-quick schemes of selling stuff to America. Playfair writes (all emphasis added is mine),

Ever since the invention of paper credit, trade has had a latitude it did not before enjoy, and its progress being less natural, has become more intricate. That bound set and preserved by the nature of things was removed, when paper credit was first invented; previous to which, nothing represented wealth that was not wealth itself, or that was not physically worth the sum it represented; and in order to give credit in business, it was absolutely necessary either to possess, or to have borrowed capital.

And because of this new credit, people started making business decisions that were insane. They started shipping products to America before they knew they could sell them. Since the money was free they took irrational risks. And if your business venture failed miserably you could always just hide from your creditors in that new land of opportunity.

Of the eventual crash, Playfair writes,

For the first fifty years, we observe the simple and regular growth, from poverty to wealth, of a new country; during the succeeding twenty years, we are astonished at the extent and operation of a mad mercantile speculation carried on by our own country; and the period which succeeds, shews the catastrophe that so airy and so ill-founded a project was likely, sooner or later, to experience. There is not any branch of trade, which, from the nature of its progress, affords so much instruction as this. It merits equally the attention of the philosopher, the politician, and the merchant; for it throws light upon all the three different objects of their pursuits.

Isn’t that beautiful? Almost the same words could apply to the current financial crisis. And one final quote that I like, which also made me think of our current crisis:

Upon the manner in which business is conducted, depends something more than merely the gaining or losing a little money. The happiness of numbers of innocent individuals is frequently depending upon the success of projects, with the formation of which they had no concern. What numbers have been ruined, and how many more deprived of fortune, by our ill-conducted trade with America?

What numbers have been ruined indeed.

I’ve been reading the works of Playfair to understand the history of data visualization (in this same work he also invented the bar chart, and in a successive work he invented the pie chart). I wanted to make sure I understood the history of statistical charts, since as they say, those who cannot remember the past are condemned to repeat it. I didn’t realize that phrase would also apply so perfectly to the text accompanying the images.

* Image Credits
The first image above is from William Playfair’s Commercial and Political Atlas, 3rd edition, published in 1801. The scan is of a copy contained in the University of Pennsylvania’s Annenberg Rare Book and Manuscript Library. It was reproduced in a publication by Cambridge University Press entitled The Commercial and Political Atlas and Statistical Breviary, published in 2005, which was compiled by Howard Wainer and Ian Spence (and if you want to be even more technical the image above is a reproduction from a Google scan of the Cambridge University scan). As was decided in Bridgeman vs Corel Corp (full text), a reproduction of a work of art in the public domain is not protected by copyright. As was stated in that verdict: “While it may be assumed that this required both skill and effort, there was no spark of originality — indeed, the point of the exercise was to reproduce the underlying works with absolute fidelity. Copyright is not available in these circumstances.” I am reproducing the image here with that legal precedent in mind, and with the best of intentions. I would highly recommend that if you are interested in Playfair’s work you buy the reprint by Cambridge University Press. It contains full-color reproductions of the charts, and the introduction contains great biographic information about Playfair.


Images courtesy of the Image Science & Analysis Laboratory, NASA Johnson Space Center

As I was flying back home into San Francisco airport I was watching the city lights out the window and got struck by a bit of inspiration. I find cities beautiful, from the graffiti to the neon signs to the line of headlights on the highway. A city viewed from above at night is captivating. I wanted to try to recreate that same look, but by visualizing data (in one sense you can say that the real view of a city from above is already a visualization of population data).

I started searching for images of cities at night, and found these amazing images from NASA. All those images were taken from a space shuttle orbiting the earth. These images tell you a lot about the city, the layout, urban density, planning (or lack thereof). I wanted to take other meaningful data and create similar images.

All the visualizations below have been created with SpatialKey. However, this is some experimental work I’ve been playing with to generate the “night light” images, so it’s not released (and might not ever be). Basically this is a peak behind some of the R&D work I do for fun (yes, for a dataviz dork like me making fake “cities at night” images is my idea of fun).

Crime in San Francisco
This image is all crime in San Francisco for a 3-month period. You can see some of the same features that you can see in the NASA space image, such as Golgen Gate Park and the Presidio (the area on the north-west edge of the city). All in all it’s interesting how similar the crime image looks compared to the NASA image. Downtown is the brightest spot in both images, which means that it’s literally the brightest area of the city (the most streetlights), and also has the most crime.

SF_crime

And here are breakdowns for a few different crime types. Notice how different the distributions are. Narcotics crimes are heavily clustered and can be found downtown (in the Tenderloin), in the Mission (near the 16th St BART station), and along Haight Street near Golden Gate Park. Whereas vehicle theft is scattered fairly evenly throughout the city.

Narcotics
SF_narcotics
Theft
SF_theft
Vehicle Theft
SF_vehicle_theft
Burglary
SF_burglary

Graffiti Reports in San Francisco and New York
Both San Francisco and New York publish their 311 data, which is when citizens call for city services. One category of 311 calls is to report graffiti. Graffiti is interesting in that it often follows specific city streets. When we look at the graffiti data for both cities we see specific streets that have far more graffiti than others. I love these images (particularly the one of SF) because they really look like a view of street lights from a plane.

NYC_graffitiSF_311_graffiti

Trees planted in San Francisco
Another one of my favorites of this set is data for all the trees that the city of San Francisco has planted since 1990 (all this SF data is available at datasf.org). You can see the heavy planting along Market St (which cuts diagonally through downtown), as well as along streets like Sunset Blvd (the street running north/south on the western side of the city).

SF_trees

Street lights (or SF as a giant lite-brite)
One final image of San Francisco we have is the locations of every street light in the city. I liked this image because it reminded me of playing with a Lite-Brite when I was a kid. It almost makes city planning feel light a grown-up version of playing with little plastic lights.
SF_traffic_lights