Making Sense of Big Data For Small Business: Avoid These 4 Rookie Marketing Mistakes

We’re living in a world somewhere between Minority Report and The Matrix.

All our buying behaviors and online activities are tracked. Your bank card purchases, online account activity, and loyalty card activity all allow marketers to evaluate when you purchase, what you purchase, how often you purchase it, and what related products and services you like.

Social graph, social profiles, and data sharing allow marketers to understand your demographic and lifestyle attributes even when you don’t explicitly share it with a company. Cookies allow marketers to serve ads to you based on other websites and pages you visit, and essentially follow you around the internet.

For savvy businesses and marketers, this data can help optimize marketing spend, giving a competitive edge. But all too often, this wealth of data is misused and misinterpreted, resulting in just the opposite—a false sense of confidence and wasted marketing investments.

Here are four all-too-common rookie marketing mistakes when it comes to analyzing and interpreting marketing data. Avoid these, and you’ll prevent 80% of the erroneous decisions that impede marketing effectiveness.


As marketers we see studies almost daily with “insights” (more on insights later) like the following:

Over a third of email subscribers read their newsletters exclusively on mobile devices (Informz).

Influencer campaigns have proven effective for over 80% of marketers who have tried them (eMarketer).

When social media is part of their buyer’s journey, customers tend to convert at a 129% higher rate. They are also four times as likely to spend significantly more than those without a social component (Deloitte).

Marketers who prioritize blogging are 13x more likely to realize a positive ROI (Socialemedia).

Tuesday at 10am is the best day and time to send an email for highest open rates (multiple sources).

So what’s the problem with these? NONE of them are specific to your industry, geography, or customers. It’s what I call the tyranny of aggregate statistics, otherwise known as the statistical fallacy of overgeneralization (my name sounds more villainous).

I’ve always loved the famous Mark Twain quote,

“There are three kinds of lies: Lies, damned lies, and statistics.”

Twain might have been referring to statements like this: “We analyzed 219 million emails to find out the best time of day to send email campaigns.” 219 million emails must be reliable, right?

Except that those 219 million emails are comprised of emails sent in dozens of industries, in hundreds of geographies, to potentially hundreds of micro-segments. If you are interested in reaching women between the ages of 35 and 55 in Bellevue Washington who have a household income above $100k/year, with an interest in anti-aging products, how predictive do you think the 219 million email sample will be of your audience’s behavior?

Let’s say that you now break apart that 219 million into 100 micro segments just like your target audience. What you may very well find is that as few as 5 or 10 of those segments actually exhibit the best open rates at 10am on Tuesdays. That means that for over 90% of the segments, the conclusion is false.

In a nutshell, the population used for most statistics you read about is rarely a representative sample of your target audience. Just remember this: What’s true in aggregate isn’t necessarily true for your customers.


If I had the proverbial nickel for every time this error was made, I’d be on a beach right now instead of writing this article. What’s particularly surprising is that I hear it from so-called experts and research organizations who should know better.

Back when I was working at one of the big four wireless providers, I read a study by a major research firm that claimed paperless billing initiatives result in increased customer tenure and lifetime value. The study compared customers with and without paperless billing, and concluded that those with paperless billing stayed longer and had higher average spend.

This is a classic example of the causality error, and involves another related error called selection bias (more on that in a bit). What the analysis actually showed is a correlation between spend and paperless billing, and between tenure and paperless billing. But it did NOT show that paperless billing was the causal factor in those correlations.

In fact, the causality could be just the opposite…that increased loyalty causes increased paperless billing uptake. Let’s say you are doing an analysis of cruise control usage by motorists. You hypothesize that using cruise control causes people to drive more. So you look at two groups of people: those who have used cruise control in the past month, and those who haven’t. Lo and behold, you find that people who have used cruise control drive more miles!

Does this mean that cruise control caused the increased travel? Should oil companies start up a campaign to promote cruise control to boost gasoline sales? Of course not. If anything, causality likely flows in the opposite direction: people who drive longer distances tend to need and use cruise control more. Similarly in the case of paperless billing, it’s more likely that factors like pre-existing satisfaction, low cost sensitivity, or aversion to switching cause increased paperless billing uptake, not the other way around. Set it and forget it.

This doesn’t mean that you should ignore correlation. After all, most smaller businesses don’t have the resources to perform double blind controlled studies. But be smart about how you interpret the data, and understand what your data is, and isn’t, saying.

In your marketing campaigns, utilize simple A/B tests with single variables. You can set up multiple series of these tests in succession to find causal attributes that gradually improve the effectiveness of your programs.


It’s important when you are evaluating the incremental value generated by a campaign, that you measure results among homogenous segments. In a nutshell, your goal is for the only difference between the comparison groups is that one received the offer or campaign (treatment group) and the other didn’t (control group, or holdout). Big companies utilize statistical holdouts typically around 10% of their target population, but the notion still applies even for small businesses doing scrappy campaigns. Compare apples to apples.


Years ago while running email campaigns for a very large coffee company, I was reviewing past email performance to understand how much incremental revenue the email programs were generating. The senior analyst (a Ph.D. former statistics professor) held out a control from the email recipient population. Then she compared the revenue per customer of the treatment and control groups during the campaign period.

So far so good. Except, upon deeper digging into the reporting methodology, I discovered that the analyst measured treatment group revenue only among those who opened the email. People who didn’t open it, she argued, essentially were like people who didn’t receive it, so they were excluded in order to “not dilute” the results.

What’s wrong with this?

The groups are no longer homogenous. By looking at only openers, the treatment group was changed to remove 60% of its population. That 60%, unsurprisingly, turned out to have a significantly lower average spend than the 40% that remained, in turn skewing the revenue per customer artificially high.

In the world of small business, the same principle applies, even without a controlled test. The name of the game is ALWAYS incrementality…how much incremental revenue did the campaign generate, and at what cost? In order to know, you need to compare behavior among groups that are as similar as possible.

Keep the following in mind to create homogenous comparison groups:

  • Try to match the timing of results — a holdout allows the exact same timing. If a holdout isn’t feasible, you can look at adjacent periods, but be aware that normal variance is at play between your periods. Look at a long timeframe and understand your revenue variance.
  • Adjust for seasonality — when there is high seasonal effect, compare to the same period a year ago, and make any adjustments for YoY increases or decreases.
  • Isolate channels — if possible, try to exclude any revenue coming in through other channels. In other words, if you are doing a test of a Facebook campaign, set your baseline revenue on the revenue generated by Facebook only in your comparison period and campaign period.


A trendy mantra today is, “follow the data”. I say, “follow the insights.”

When you make decisions based on data points only, you incur opportunity costs and increase the failure rate. Insights lead to opportunities and success.

Think about it. Data points usually come in the form of aggregate statistics, which, as we’ve already discussed. But beyond that, even when you know how your unique audience behaves, what does that really tell you?

I encourage you to relentlessly ask, “Why?” and “How”:

  • “Why do women respond better than men to my Facebook ads?”
  • “Why are my conversion rates low?”
  • “Why do people not engage with my brand on Twitter as much as they do on Instagram?”
  • “How has the trend been changing?”

You know the game show, Wheel of Fortune, where contestants spin the wheel and guess at letters to reveal a phrase in the puzzle. Think of data points like the letters. Once you get enough of them, the insight becomes clear.

Don’t forget amidst all this data, that your customers are people with needs, desires and pain points. Your marketing communications need to strike an emotional chord in order to influence an action. Insights, not data points, help you create relevant messages, offers, executions that will resonate, and prevent these marketing mistakes.

Questions? Leave a comment or contact us.


Got an Idea?

Jump on Board! Let’s Reach the Stars Together!
Let’s talk