# How to analyze survey data

You’ve collected your survey results and have a survey data analysis plan in place. Now it’s time to dig in, start sorting, and analyze the data.

## Survey data analysis made easy

The results are back from your online surveys. Now that you’ve collected your statistical survey results and have a data analysis plan, it’s time to dig in, start sorting, and analyze the data. Here’s how our Survey Research Scientists make sense of quantitative data (versus making sense of qualitative data), from looking at the answers and focusing on their top research questions and survey goals, to crunching the numbers and drawing conclusions.

### Here are four steps aimed at showing you how to analyze data more effectively:

1. Take a look at your top research questions
2. Cross-tabulate and filter your results
3. Crunch the numbers
4. Draw conclusions

### Take a look at your top research questions

First, let’s talk about how you analyze the results for your top research questions. Did you feature empirical research questions? Did you consider probability sampling? Remember that you should have outlined your top research questions when you set a goal for your survey.

For example, if you held an education conference and gave attendees a post-event feedback survey, one of your top research questions may look like this: How did the attendees rate the conference overall? Now take a look at the answers you collected for a specific survey question that speaks to that top research question:

Do you plan to attend this conference next year?
Yes 71% 852
No 18% 216
Not sure 11% 132
Total 1,200

Notice that in the responses, you’ve got some percentages (71%, 18%) and some raw numbers (852, 216).

The percentages are just that–the percent of people who gave a particular answer. Put another way, the percentages represent the number of people who gave each answer as a proportion of the number of people who answered the question. So, 71% of your survey respondents (852 of the 1,200 surveyed) plan on coming back next year.

This table also shows you that 18% say they are planning to return and 11% say they are not sure.

### Cross-tabulating and filtering results

Recall that when you set a goal for your survey and developed your analysis plan, you thought about what subgroups you were going to analyze and compare. Now is when that planning pays off. For example, say you wanted to see how teachers, students, and administrators compared to one another in answering the question about next year’s conference. To figure this out, you want to delve into response rates by means of cross tabulation, where you show the results of the conference question by subgroup:

Yes No Not sure Total
Teacher 80%
320
7%
28
13%
52
400
184
40%
160
14%
56
400
Student 86%
344
8%
32
6%
24
400
Total respondents 852 216 132 1,200

From this table you see that a large majority of the students (86%) and teachers (80%) plan to come back next year. However, the administrators who attended your conference look different, with under half (46%) of them intending to come back! Hopefully, some of our other questions will help you figure out why this is the case and what you can do to improve the conference for administrators so more of them will return year after year.

Using a filter is another useful tool for modeling data. Filtering means narrowing your focus to one particular subgroup, and filtering out the others. So instead of comparing subgroups to one another, here we’re just looking at how one subgroup answered the question. For instance, you could limit your focus to just women, or just men, then re-run the crosstab by type of attendee to compare female administrators, female teachers, and female students. One thing to be wary of as you slice and dice your results: Every time you apply a filter or cross tab, your sample size decreases. To make sure your results are statistically significant, it may be helpful to use a sample size calculator.

### Benchmarking, trending, and comparative data

Let’s say on your conference feedback survey, one key question is, “Overall how satisfied were you with the conference?” Your results show that 75% of the attendees were satisfied with the conference. That sounds pretty good. But wouldn’t you like to have some context? Something to compare it against? Is that better or worse than last year? How does it compare to other conferences?

Well, say you did ask this question in your conference feedback survey after last year’s conference. You’d be able to make a trend comparison. Professional pollsters make poor comedians, but one favorite line is “trend is your friend.”

If last year’s satisfaction rate was 60%, you increased satisfaction by 15 percentage points!  What caused this increase in satisfaction? Hopefully the responses to other questions in your survey will provide some answers.

If you don’t have data from prior years’ conference, make this the year you start collecting feedback after every conference. This is called benchmarking. You establish a benchmark or baseline number and, moving forward, you can see whether and how this has changed. You can benchmark not just attendees’ satisfaction, but other questions as well.  You’ll be able to track, year after year, what attendees think of the conference. This is called longitudinal data analysis.

You can even track data for different subgroups. Say for example that satisfaction rates are increasing year over year for students and teachers, but not for administrators. You might want to look at administrators’ responses to various questions to see if you can gain insight into why they are less satisfied than other attendees.

### Crunching the numbers

You know how many people said they were coming back, but how do you know if your survey has yielded answers that you can trust and answers that you can use with confidence to inform future decisions? It’s important to pay attention to the quality of your data and to understand the components of statistical significance.

In everyday conversation, the word “significant” means important or meaningful. In survey analysis and statistics, significant means “an assessment of accuracy.” This is where the inevitable “plus or minus” comes into survey work. In particular, it means that survey results are accurate within a certain confidence level and not due to random chance. Drawing an inference based on results that are inaccurate (i.e., not statistically significant) is risky. The first factor to consider in any assessment of statistical significance is the representativeness of your sample—that is, to what extent the group of people who were included in your survey “look like” the total population of people about whom you want to draw conclusions.

You have a problem if 90% of conference attendees who completed the survey were men, but only 15% of all your conference attendees were male. The more you know about the population you are interested in studying, the more confident you can be when your survey lines up with those numbers. At least when it comes to gender, you’re feeling pretty good if men make up 15% of survey respondents in this example.

If your survey sample is a random selection from a known population, statistical significance can be calculated in a straightforward manner. A primary factor here is sample size. Suppose 50 of the 1,000 people who attended your conference replied to the survey.  Fifty (50) is a small sample size and results in a broad margin of error.  In short, your results won’t carry much weight.

Say you asked your survey respondents how many of the 10 available sessions they attended over the course of the conference. And your results look like this:

1 2 3 4 5 6 7 8 9 10 Total Average rating
# sessions attended 10%
100
0%
0
0%
0
5%
50
10%
100
26%
260
24%
240
19%
190
5%
50
1%
10
1,000 6.1

You might want to analyze the average. As you may recall, there are three different kinds of averages: mean, median and mode.

In the table above, the average number of sessions attended is 6.1. The average reported here is the mean, the kind of average that’s probably most familiar to you. To determine the mean you add up the data and divide that by the number of figures you added. In this example, you have 100 people saying they attended one session, 50 people for four sessions, 100 people for five sessions, etc. So, you multiply all of these pairs together, sum them up, and divide by the total number of people.

The median is another kind of average.  The median is the middle value, the 50% mark. In the table above, we would locate the number of sessions where 500 people were to the left of the number and 500 to the right. The median is, in this case, six sessions. This can help you eliminate the influence of outliers, which may adversely affect your data.

The last kind of average is mode. The mode is the most frequent response. In this case the answer is six. 260 survey participants attended six sessions, more than attended any other number of sessions.

Means–and other types of averages–can also be used if your results were based on Likert scales.

### Drawing conclusions

When it comes to reporting on survey results, think about the story the data tells.

Say your conference overall got mediocre ratings.  You dig deeper to find out what’s going on.  The data show that attendees gave very high ratings to almost all the aspects of your conference — the sessions and classes, the social events, and the hotel — but they really disliked the city chosen for the conference.  (Maybe the conference was held in Chicago in January and it was too cold for anyone to go outside!) That is part of the story right there — great conference overall, lousy choice of locations.  Miami or San Diego might be a better choice for a winter conference.

One aspect of data analysis and reporting you have to consider is causation vs. correlation.

## Appendix

### What is survey data collection?

Survey data collection uses surveys to gather information from specific respondents. Survey data collection can replace or supplement other data collection types, including interviews, focus groups, and more. The data collected from surveys can be used to boost employee engagement, understand buyer behavior, and improve customer experiences.

Go back

### What is longitudinal analysis?

Longitudinal data analysis (often called “trend analysis”) is basically tracking how findings for specific questions change over time. Once a benchmark is established, you can determine whether and how numbers shift. Suppose the satisfaction rate for your conference was 50% three years ago, 55% two years ago, 65% last year, and 75% this year. Congratulations are in order! Your longitudinal data analysis shows a solid, upward trend in satisfaction.

Go back

### What is the difference between correlation and causation?

Causation is when one factor causes another, while correlation is when two variables move together, but one does not influence or cause the other. For example, drinking hot chocolate and wearing mittens are two variables that are correlated — they tend to go up and down together.  However, one does not cause the other.  In fact, they are both caused by a third factor, cold weather. Cold weather influences both hot chocolate consumption and the likelihood of wearing mittens. Cold weather is the independent variable and hot chocolate consumption and the likelihood of wearing mittens are the dependent variables. In the case of our conference feedback survey, cold weather likely influenced attendees dissatisfaction with the conference city and the conference overall. Finally, to further examine the relationship between variables in your survey you might need to perform a regression analysis.

Go back

### What is regression analysis?

Regression analysis is an advanced method of data visualization and analysis that allows you to look at the relationship between two or more variables. There a many types of regression analysis and the one(s) a survey scientist chooses will depend on the variables he or she is examining. What all types of regression analysis have in common is that they look at the influence of one or more independent variables on a dependent variable. In analyzing our survey data we might be interested in knowing what factors most impact attendees’ satisfaction with the conference. Is it a matter of the number of sessions? The keynote speaker? The social events? The site? Using regression analysis, a survey scientist can determine whether and to what extent satisfaction with these different attributes of the conference contribute to overall satisfaction.

This, in turn, provides insight into what aspects of the conference you might want to alter next time around. Say, for example, you paid a high honorarium to get a top flight keynote speaker for your opening session. Participants gave this speaker and the conference overall high marks. Based on these two facts you might think that having a fabulous (and expensive) keynote speaker is the key to conference success. Regression analysis can help you determine if this is indeed the case. You might find that the popularity of the keynote speaker was a major driver of satisfaction with the conference. If so, next year you’ll want to get a great keynote speaker again. But say the regression shows that, while everyone liked the speaker, this did not contribute much to attendees’ satisfaction with the conference. If that is the case, the big bucks spent on the speaker might be best spent elsewhere. If you take the time to carefully analyze the soundness of your survey data, you’ll be on your way to using the answers to help you make informed decisions.