Log inSign up
Blog results
Showing 0 of 0 results
Stay curious! You'll find something.
Survey Science

How to talk about correlation and causation in your survey data

How to talk about correlation and causation in your survey data

There’s nothing quite like uncovering a significant finding in your survey data that seems to suggest that you’re really on to something. Even we survey scientists get excited about the prospect of finding something significant. But remember if you’re going to collect data like a scientist, you’ve got to report on it like you’re a scientist, too.

Chances are you’re running a survey that’s about something that you—or your company—are really interested in or passionate about. That can make it particularly tempting to talk up your survey results, but remember that you can get yourself in trouble if you make or encourage statements you can’t back up.

To ensure we’re reporting our results fairly, survey scientists stick to a couple of easy-to-remember principles that keep us out of hot water: knowing that your data usually isn't causal and being transparent about flagging self-report data.

Always remember correlation does not imply causation

Under nearly all circumstances, you can’t say that your survey results cause, lead to, prove, or (insert verb) anything else—even when the evidence seems like a slam dunk. We can't say this enough times: Correlation does not imply causation. To make this point, we’ll start with a slightly obvious example.

Imagine you ran a survey that found that among Americans who say they have light eyes, 75% say they have blonde hair. Among Americans who say they have dark eyes, only 10% say they have blonde hair. The obvious conclusion from this data would NOT be that light eyes cause blonde hair.

Light eyesDark eyes
Blonde hair75%10%
Dark hair25%90%

This might seem like a silly example, but it’s only because we know too much about human biology to make assumptions like that. What about when we don’t have as much context?

For example, imagine a marketer at Acme brand sends a survey to Americans and discovers that 75% of people who have used Acme brand are wildly successful, while only 10% of people who have not used Acme brand are similarly successful. In this case, the marketer’s worldview may lead him to say that Acme products cause or lead to greater success.

Have used AcmeHave NOT used Acme brand
High personal success75%10%
Low personal success25%90%

In reality, with the data alone he has no more grounds to show that Acme products cause success than we can imply that light eyes cause blonde hair. For all we know, a third thing causes people to purchase Acme brand AND achieve immense personal success. There could be thousands of ways the two relate—positively or negatively.

What we can say is that X is associated, or correlated, with Y—for example, use of Acme brand products are associated with higher self-reported personal success. And letting the numbers speak for themselves is even better! You can say: “75% of people who have purchased from Acme brand consider themselves very successful. Among those with who haven’t purchased from Acme? Only 10% report similar levels of personal success.” Even reporting on correlation alone can be a handy tool.

2. Be transparent about self-report data

Answers to self-report questions are a valuable way to understand how people think about themselves and the world around them, but they shouldn’t be confused with objective facts. To ensure you’re being as clear as possible, you should always identify which questions in your survey were self-report. There are three types of self report questions:

  • Prediction: questions that ask a respondent to guess about something in the future.
    • Example: Will you still be working at your company two years from now?
      • Good:75% of people who like their managers say they will be at Acme company two years from now.
      • Bad: 75% of people who like their manages will be at Acme company two years from now
    • Why you need to clarify: Even if the respondent is being as accurate as they can be, we don’t know for sure if their prediction is correct.
  • Belief about someone else: questions that ask what the respondent believes about someone else.
    • Example: How much do you think  your doctor cares about you?
      • Good: At our hospital, 99% of patients say their doctor cares about them.
      • Bad: At our hospital, 99% of doctors care about their patients.
    • Why you need to clarify: It’s impossible for the respondent to be wrong in their perception of the other person (it’s an opinion, after all), but that perception may not be reality if you ask the other person themself.
  • Belief about self, measurable in other ways:  questions that ask about a respondents perception about themselves, but that are subjective and could be confirmed or refused by concrete measurements.
    • Example: How successful are you at life?
      • Good: People who have purchased Acme products are more successful than non-purchasers in a number of ways. They have higher incomes, more friends, and have greater life satisfaction.
      • Bad: Among people who have bought Acme products, 100% say they are very successful, but only 50% of people who have not purchased from Acme believe they are successful.
    • Why you need to clarify: When answering this question, a respondent can’t be wrong in their perception of themself, but again, that perception may not be congruent with other ways of measuring the item. In this example, it’s more compelling to paint a picture of a successful person using concrete metrics - we could choose to measure personal success by number of close friends, income, and life satisfaction.

On the other hand, there are questions that we can expect someone to know a relatively true answer, and that answer can’t be disproven by other metrics. These questions fall into categories like personal identity (are you Hispanic), emotion (how happy are you with your purchase), preferences (do you like mangoes) or past, current, or habitual behavior (have you ever purchased from Acme brand? Do you ever wear dresses?). In these cases, it’s not so important to stress that the question was self report.

Bottom line, you can’t entirely control the conclusion that people who read or hear about your data jump to. The best thing you can do is report the data as fairly and accurately as possible. That way, if people come to erroneous conclusions, they get there themselves— and not because of you."