Creating Data-Driven Content: the Step by Step Process

default featured image


Data is the new content.

Don’t believe me?  The most popular political journalist in the United States is writing computer models, not spending time with campaigns.

And that’s just one example.  Using data to tell an interesting story is one of the most compelling ways to drive attention, links, and shares to your site.

For example, and OKCupid, both incredible examples of effective content marketing, both used a mixture of data analysis, data visualization, story telling, and promotion to gain significant mindshare in two very different markets – finance and dating.  Now pricing data up-and-comer Priceonomics is writing a highly popular blog featuring articles that analyze their own data.

But data journalism and visualization is tough – and this means it hasn’t been incredibly overused yet.  There are still outsized returns available in data-based content marketing.  And today I’m going to show you how to do some data analysis to make your own data-based blog posts, visualizations, and more.

Step 1: Go Find a Data Set

To do data-driven content you’ll need data. Fortunately, there are many places you can get great data:

Your Own Data

If you are lucky enough to work for a company that can generate it’s own data, you’re in luck.  This is especially true for pure-play web and ecommerce companies, especially those that by their very nature, collect a lot of data.  Your own data is usually the best data, because only you have it, and any analysis you do will support your company’s core differentiation.

Even if you think you don’t have interesting data, you probably do.  Do mac users prefer different articles than Windows users?  Do Chrome users buy different products than IE users?  You can get stats like this from your web analytics platform with a little bit of analysis.

Public Data

If you don’t have data, there are numerous places you can find some to analyze and get down with OPD (other people’s data).  Look for “(KEYWORD) Data Sets” on Google, and start your search from there.  Alternatively, here are some of my favorite data sources:

You can either run your own Google Consumer Survey (which are really quite accurate compared to professional polling), or use some of the data they release.

This is one of my favorite sources, especially looking for anything linguistic.  With the Google Ngram viewer, you can analyze the occurrence of words in the Google Books corpus over time.

For example, you can learn that startup was predominantly hyphenated until 1998, when it became dominantly used as one word, around the time of the first dot com bubble.

Government Data

The US government also releases a great deal of data on their data portal at This includes everything from life insurance data, to geographical data, to food pyramid and earthquake data. This is a great source for mashing public data up with your data.  For example, if you wanted to compare spending by state in your ecommerce store to income per capita by state, you could grab the state income data here.

Other Data Resource Sites

Distilled’s Mark Johnstone wrote a fabulous post about finding link-worthy data on SEOmoz, which contains many excellent sources.  He also shares some useful tips on constructing your own web scrapers, which is an excellent and highly differentiating approach to gathering data.  There’s also a website called Gapminder that shares numerous data sets.


I’m going to walk you through the process of creating data-driven content out of a data set in this post.  In this case, I’ve found an awesome set of data about movie genre and profitability, called “Most Profitable Hollywood Stories” from Information is Beautiful.


Step 2: Look for Angles & Ask Some Questions

Notice how step 2 isn’t “Analyze your data” or “Go make some graphs and tell people how much like Nate Silver you are.”

The number one mistake people make with these efforts is they don’t start with questions, or an angle.  They just go ahead and visualize the data set, and wonder why people don’t find it more interesting.

I start by asking myself, “If I put together these fields, what interesting stories could I tell?” and “What other data could I mash these up with to show some fascinating conclusions?”

Sometimes I go ahead and write headlines based on these analyses – if it makes a great headlines, chances are it will make a fascinating visualization.

(You might go down some blind alleys with these – so be thoughtful and come up with a few different ideas.  It’s very possible a fascinating headline or correlation will not be borne out by the data – don’t write that article. Not only will it damage your personal credibility and your client or company’s brand, it will not get traction and spread.)


Let’s take a look at the data.  It has fields like:

  • Film
  • Year
  • Studio
  • Rotten Tomatoes Score
  • Audience Score
  • Plot Type
  • Genre
  • # of Theatres in Opening Weekends
  • Box Office Average per US Cinema
  • Domestic Gross
  • Foreign Gross
  • Worldwide Gross
  • Budget
  • Market Profitability
  • Opening Weekend
  • Oscars Won

Of a few hundred films from 2007 to 2011.

Now, there’s a lot of interesting things you can do with just this data set:

–          Is there any relationship between how much money a film makes and how much people like it?

–          Are highly profitable films necessarily good?

–          What genre of film is the most profitable?

–          What ‘Story’ archetype is the most profitable?

–          If I had $1 million to make a movie with, what should I do to maximize my return?

You can also add other data sets:

  • Director and Actor information from IMDB
  • Film Ratings (G, PG, etc)
  •  A Number of Religious Organizations give Films ‘Morality Scores’

There’s some really fun data analysis opportunities here, like:

–          Which directors are the most profitable?  Who’s the Yankees (big, expensive, successful), and Who’s the Oakland A’s (lean and efficient) ?

I’ll take a relatively simple approach: What story archetype is the most profitable?  If you find $10 million dollars on the ground, what sort of film should you go shoot?

Step 3: Analyze Your Data

Now it’s time to analyze your data.  If you have a relatively small amount of data, you can use Excel or your favorite spreadsheet program.  If your data numbers into the millions of rows, you’ll start to need more robust tools like R, SPSS, or SAS.

Analyzing data to find trends and valuable insights is an entire separate art and science than creating data-driven content.  I recommend the book  Data Analysis with Open Source Tools by Phillip K. Janert or any introductory statistics textbook to learn elementary data analysis.


To start the analysis of my data set, I’m going to drag everything into a Pivot table.  (This isn’t the best approach, but it will work for my purposes.)

Next, I’ll look at all of the films in profitability on aggregate.  The most profitable films (in total gross as a percentage of budget) of the last few years are:

  1. Paranormal Activity – 1311200%
  2. Fireproof – 6693%
  3. Insidious – 6467%
  4. Paranormal Activity 2 – 5916%
  5. Paranormal Activity 3 – 4037%
  6. The Last Exorcism – 3692 %
  7. Juno – 3082 %
  8. The King’s Speech – 2849 %
  9. Black Swan – 2533%

Just in this data alone, there are a number of interesting observations:

–          Horror movies are incredibly profitable

–          The Paranormal Activity franchise is the Oakland A’s of movies – in terms of return on capital, it is by far the most profitable of anything done in the last 5 years.

–          Making a film with strong niche appeal on a limited budget is a much stronger strategy to generate return on invested capital than to create an expensive blockbuster.

Next, I’ll drag all of my films together into a chart and try to understand profitability by story type:

(I’m going to exclude Paranormal Activity 1 from this calculation because it’s such an outlier. I’ve also removed some outliers on the low end, because foreign gross data was unavailable and thus skews the data set.)

(In case you’re wondering, the Wretched Excess films in the dataset were Black Swan, Hesher, J. Edgar, Limitless, Solitary Man, and There Will Be Blood.  Monster Force is both horror movies (Saw, Paranormal Activity), and some action movies (Xmen, Cowboys versus Alient, etc.)  All of the plot types are defined in various screenwriting articles.)

Wretched Excess is the most profitable on average, but also highly variable and a small set with the average buoyed by Black Swan.  Monstrous Force is also very profitable, but the most variable.  By contrast, Fish Out of Water films (like the Lincoln Lawyer or Meet Dave) were consistency profitable with much lower variation than the other story types.

Step 4: Create Your Scaffolding

Once you have some idea of the direction you want to go with your content, you can start creating a scaffolding.

A scaffolding should answer questions like:

–          What is the content about?

–          What’s your angle?  Why will it appeal to people?

–          Will you need graphics or copy?

–          Will you need a designer or front-end developer to create the page the content will ‘live’ on? Or will it go on your blog or CMS?

These can include wireframes, mockups, or outlines of content.


Using my profitability data, I’m going to create a piece called “MoneyBall for Movies”.  It will focus on the films, genres, stories, and people that maximize profitability with limited budget.

I’ll help people understand what drives movie profitability – usually films that become very popular in specific audiences (like religious or horror fan communities) that are made for extremely little money.

I’d include copy about the most profitable films, genres, and stories, as well as data visualization.

Step 5: Visualize Your Data and Write Your Copy

Now it’s time to write and make graphics.

Writing About Data

Most of the best instruction for this kind of work comes from the world of data journalism – here are some great resources:

–          The Data Journalism Handbook

–          Investigating Data Journalism on the O’Reilly Radar

–          5 Tips for Getting Started in Data Journalism on

Data Visualization

While it’s always best to make your own visualizations by hand (using help from a designer or developer), you can create visualizations quickly with a number of different tools.

–          IBM’s ManyEyes

–          Tableau Public

–          The R Project

–          Google Fusion Tables

Jon Cooper has also put together a useful list of data visualization tools.


I’m not going to write any great copy or make any awesome visualizations here – sorry.  But lots of other folks have done great things with the Hollywood Stories data set – here’s some inspiration for you:

FilmStrips by Tom Evans

Hollywood Data Explorer by James Fisher

Confluence by Harshawardhan Nene and Kedar Vaidya

Step 6: Promote Your Content

If you look at some of the awesome visualizations in step 5, you might notice that many of them haven’t attracted a lot of links or tweets.  This is terrible, because a lot of them are great.

For example, Confluence has 4 linking root domains and 3 Tweets, 5 Plus Ones, and 9 Facebook likes (via SharedCount and Open Site Explorer.)  This is a beautiful, powerful visualization tool, but it didn’t attract attention all by itself.  If you want your data-driven content to get seen by the world, linked, and shared, you need to promote it.

Promotion Tips:

  • Make sure the originator of the data set knows you analyzed it.  Often these organizations will share your work.
  • Many great news outlets (like Fast Company, the Guardian, the Wall Street Journal, and the New York Times) have data visualization editors, data journalists, or columns featuring data-based work.  Reach out to them and let them know about your new project.
  • Reach out to bloggers that cover both data visualization and the topic of the data.
  • Consider using paid social media like Reddit Ads and StumbleUpon Paid Discovery.  These services are great for getting users with “see interesting things on the internet” intent, and often spread the content they find across their network.


In the “Moneyball for Movies” example, I can reach out to economics bloggers, movie bloggers, and Hollywood ‘inside baseball’-type bloggers.  I can also reach out to horror movie bloggers, who might like to know their preferred genre is the most effective use of a dollar of movie-making capital.

I’d also advertise on StumbleUpon Paid Discovery, targeting both movie-related and data-related channels, and try to get some social traction.

El Fin

There you have it – now go forth and analyze.



Matt works on customer acquisition at BuzzStream. Before BuzzStream, he worked as an SEO Strategist at Portent and a Marketing Manager at AppCentral (acquired by Good Technology). You can follow Matt on Twitter or Google Plus.
More Posts by


  • Another unexplored, under utilized frontier of the internet marketing world. Thank you for the ideas on how to implement data. We all have it, now we have to learn to use it.

  • Matt, this is a really useful post. I think the thought processes in coming up with ideas that you have described are particularly useful because if you have a process, then it can light things up when you have “writer’s block”.

Related Posts

  • default featured image
    How to Do Email Outreach in 5 Simple Steps
    If you’ve ever popped into your spam folder, you’ve seen what a lousy outreach email looks like. ...

  • hero image of man who is an outreach champ
    C.H.A.M.P. Outreach Method (That Got Me Thousands of Links)
    Are you sending a ton of emails and waiting for responses? We’ve all been there. Unanswered email...

  • How to Find Someone’s Email (For Free) in 2024
    You can find anything on the internet, right? It sure seems that way. So why is it so hard to fin...