Others may hate, but I’m a big fan of using bubbles to display data. When implemented correctly (i.e. scaled in terms of area instead of diameter), bubbles can be an aesthetically appealing and concise way to represent the value of data points in an inherently visual format. Bubbles are even more useful when they include interactivity, with events like mouseover and zoom allowing users to drill down and compare similar-sized bubbles more easily than they can in static graphics. So, when I was recently working on a class project on autism diagnoses in New York City, I decided to use bubbles to represent the percentage of students with individualized education plans at all 1250 or so K-8 New York City schools. Continue reading
A quick refresher from my data visualization professor here at Columbia a couple of weeks ago reminded me why I was forced to spend all those grueling hours calculating standard deviation back in high school.
See, when you’re using a data set to tell a story, the first step is to understand what that data says. And to do that, you’ve got to have a good idea of the range and variation of the values at hand. Not only can figuring that information out help you determine whether there’s any statistical significance to your data set, but it can also pinpoint outliers and possible errors that may exist within the data before you begin the work of visualizing it.
Thanks to powerful processing programs like Excel, we can figure out the variability of data sets pretty easily using the program’s built-in standard deviation function (remember this intimidating-looking equation from calculus class?). Still, it always helps to know how to calculate the information out by hand, if only to get a conceptual idea of why numbers such as the standard deviation (the variability of a data-set) and the z-value (the number of standard deviations a given value is away from the mean) even matter in the first place when it comes to data visualization.
So, to brush up on my formulas and also better understand the numbers behind an actual story assignment for one of my classes, I recently hand-calculated the standard deviation and z-values for a set of data on state-by-state obesity rates. From my calculations, I was able to use the standard deviation (3.24) to determine that, on average, most states fell within the middle of the bell-curve for the average national obesity rate (27.1 percent) . In addition, the z-values helped me understand which states stood out from the pack as possible outliers (Mississippi is by far the most obese with a 2.13 z-value, Colorado the least obese with a -1.9 z-value). To get an idea of how those formulas look hand-calculated in Excel, check out my spreadsheet here. And keep these formulas in mind while working on your next data story. They can potentially save you time and effort by helping you figure out what your data set says before you have to go through the often-lengthy process of visualizing it.
With the Pulitzer Price announcements coming up later this afternoon, you’d think I’d be writing about whose up for the “Best Deadline Reporting” or “Best Public Service Journalism” prizes. But instead I want to talk about a different media award doled out during the past week: BostonGlobe.com’s designation as the “world’s best designed website” by the Society for News Design. Put simply, I can’t say I disagree. Continue reading
What’s making the South –– the region CDC Dr. William Dietz has dubbed “the heart disease and stroke belt” –– more chubby than the rest of the country? Here are five possible explanations.Continue reading