Scroll to read more

Using Data Science to: Create animated charts.

Introduction

When I started this Data Science journey, someone told me that the key to success would be learning to learn very quickly. Now that I’m further along, I think that they were completely correct.

The truth is that there are a ton of complex tools, concepts, and technologies to learn and understand. But by the time you’ve learned them, there are already new tools coming out that are faster, better, and easier.

Now that I feel good about my foundation, anytime I come across an exciting new tool or application, I try to learn it as quickly as possible. So when I started to notice a trend of animated charts pop up, I knew it was time to jump in and figure out how they are made.

I knew from past experience that some of this could be done manually through a javascript library called D3.js, but that it would be extremely time consuming. I did a little more research and found out that many people were using a tool called Flourish.

After checking it out, I have to say, Flourish blew me away. It’s fast, flexible, super easy to use, and free (as long as you don’t mind making your data public). I decided to spend this week exploring its animated options. Below is a breakdown of some of my favorite charts they had to offer.

Bar Chart Race

Military Spending by Nation

Overview

First and foremost I wanted to build an animated bar chart end to end, meaning that I wanted to use novel data that I would have to extract, clean, and upload. I thought that military spending could be an interesting topic, and luckily I was able to find some data on just that. I’ve made the cleaned dataset available here.

What I learned in this process is that once the data is properly formatted, building the animation is extremely easy. Of course, data cleaning is a huge task on its own, but with Flourish, that’s basically 95% of the work.

Overall I think that this is a super powerful visualization. It captures time, scale, and really helps tell a story.

What it’s good for

This is a very flexible chart type and is great for any examples with 1 dynamic value measured at regular time intervals. Here are some potential examples:

Average MPG of cars by manufacturer per year.
Green house gas emissions by country per month.
Market Cap of companies per day.

Data Structure

The data here is pretty straight forward, and works great for 2 dimensional data, where one of the fields is time.

One thing to call out here is that each country is a row and each year is a column. Most data I found that worked well for this type of chart were reversed, so prepping the data required transposing and consolidating by country.

Overall

I really like this chart type and it was incredibly easy to build once the data was clean. It’s fast and responsive, and tells a great story without relying on a bunch of “chart junk” like busy legends or lengthy descriptions.

There are two major downsides. First, you can really only display 2D data, and one of those fields must be time. Second, the chart does not convey any sort of “history”, so changes over time are lost.

I would use this chart if you care about telling a simple story of a rise to dominance in a specific category that can be represented by a single value.

Line Chart Race

Human Freedom Index: Top 20 GDP Countries, 2008-2017
*I recommending selecting “Scores”

Overview

I moved on to Line Charts Races next, and decided to also do this with a novel dataset. I came across a Human Freedom index broken down by year for all the countries in the world. I’ve cleaned up the data and have made it available here.

The main benefit to this chart format is that it shows a “history” so we can look back and see change over time. However, the downside is that the scores have to be very closely bundled otherwise the scale is destroyed. (You could use logarithmic scaling on the axes, but that is not intuitive for most viewers.)

What it’s good for

This is a great tool if you are dealing with rankings or scores on a fixed scale that vary over time (ex: 1-10 or 1-100). Some potential examples could be:

Character popularity in a show across each episode.
World leader approval ratings by week during the pandemic.

Data Structure

Again this is designed for 2D data where one of the dimensions is time. As mentioned above, the values here should be tightly clustered, and measured on regular time intervals.

I originally built this with all countries that I could find data for, which was 170 in total. But this led to an extremely slow and choppy chart that was incomprehensible. This seems to work best with 10-25 entities that are being compared.

Overall

This format is great for the right kind of data, however, finding good clean data for this was much harder than I had anticipated. This format still tells a story, and makes it really easy to formulate questions (What happened with Brazil?).

The biggest downside is that this still only conveys information on one dynamic datapoint, and that those values have to be tightly clustered.

I would use this if you want to compare rankings or scores over time, especially if you want to tie the movement in metrics to key events that could explain.

Hans Rosling Chart

GDP+Population+Life Expectancy by Nation.

Overview

First off, if this chart looks familiar, it’s because it was made famous by its creator, Hans Rosling in this video. For this, I used a prebuilt dataset by Flourish comparing GDP, Life Expectancy, and Population.

This chart is very powerful in that it conveys many more dimensions of data at once than the previous two formats. Here we have 3 dynamic data points (GDP, Life Expectancy, Population) along with a static data point (Region) all animated against time. This allows us to visualize much more complex phenomenon.

What it’s good for

This chart type is very powerful for data that has multiple dimensions that all change and presumably interact over time. Some potential examples are:

-Sales, Profit, # Employees of different companies over time.
Education, Social Services, and Crime Rate of different states over time.

Data Structure

The data here is a bit more complex making it harder to get data in the right format. The key here is that there does still have to be a time component which is captured by the animation, but that still leaves space for up to 3 more dynamic data points, and 2 fixed data points (Ex: Region).

The version above contains all countries and is a little to crowded and slow in my opinion, so I would probably try to limit the total number of unique entities to 25-50 if I were using this on a novel dataset.

Overall

This chart type has huge potential for the right dataset. If you want to visualize something with multiple influences, this is a great choice, but don’t expect to knock it out in a half hour. The data prep process for this is definitely more involved, and it takes more than just a couple of vlookups in Excel.

This chart type can also be overwhelming on its own, so I think it would be great in a live presentation, but would be confusing if it was just included in a deck.

Geographical Point Map

Worldwide Earthquake Data

Overview

This final animated chart time is totally different from the others as it is designed to deal with geospatial data.

While this chart looks cool, I’m not personally the biggest fan as it is difficult to understand what is going on without significantly more context. This could be a great chart if you were dealing with very specific data and the audience was well versed on the subject matter, but for the casual observer, this chart on its own would need significantly more information.

You can also display different event types as different colors, so done properly this could show the interaction of multiple effects all on one chart.

What it’s good for

This chart type is good if you want to show global occurrences of some discrete event over time, particularly if there is a scale or severity to the event type. Some examples could be:

Volcanic eruptions
Ufo sightings
Fast Food Chain openings

Data Structure

The core of the data is still pretty straight forward, but the key difference here is that you need longitude and latitude information for every event. By contrast, there are many other geospatial chart types (Projection Maps) that only require data such as a state or country.

Beyond what is shown, there are many other fields which can be used if you want to display color coded events by type.

Overall

This could be a great format for a very specific use case, however, I personally think it would be the wrong choice for 99.999% of data sets. When building data visualizations, some people think that a complex chart is a better chart, but I disagree.

This is a complex chart, but in most cases I think the complexity here would actually be counter productive to conveying information.

Feel free to add it to your arsenal, but if you find yourself forcing a dataset to fit into the right format to make this work, you may want to consider if there is a better chart type available for what you’re trying to convey.

Conclusion

In my opinion, the goal of a visualization should be to convey a large amount of information as clearly and concisely as possible. Back in the days of print, Edward Tufte, who literally wrote the book on visualization, famously developed the “Data-Ink Ratio” for evaluating the quality of a chart.

Today, we’re no longer concerned about how much ink we use, but we are very concerned with how much time and attention any activity consumes. I would argue that updating Tufte’s formula for today’s world would probably result in something like a “Data-Attention Ratio”. With the quality of the chart measured by the information conveyed, divided by the amount of attention it consumed.

That being said, I think that charts displayed above could be extremely valuable, or extremely counter productive. In most cases, you are asking for undivided attention for 15-20 seconds, which is a lot for a single visualization.

If you have data that makes sense for these formats, then by all means, go for it. But be vary aware that the improper use of these charts could just end up wasting your audience’s time.

To wrap this up, the question I set out to answer this week was if I could use Data Science to create animated charts. The answer here is definitely yes, and it was much easier than I had anticipated. However, because it is so easy, I can foresee a day when someone forwards out a deck with 40 slides of animated charts and expects you to watch and understand them all.

Please. Don’t be that someone.