The procedure to use is, of course, a correlational analysis, but which type should you use? In this guide, we’ll walk you through the two main methods you could use for correlation. These methods are called the Pearson correlation and the Spearman correlation. We’ll take a look at what each technique involves, when each should be used, and the types of research questions that could be addressed. Also, if you are conducting usage and attitudes (U&A) research or concept testing, we can perform the analysis for you.
Before going into detail about the statistical techniques used to perform a correlational analysis, let’s quickly define what we mean by correlation. Correlational analysis is a bivariate (two variable) statistical procedure that sets out to identify the mean value of the product of the standard scores of matched pairs of observations. The purpose of this type of analysis is to find out whether changes in one variable produce changes in another. For example, does customer satisfaction increase with the size of discount offered at a grocery store or does employee engagement rise with salary increases? Note that correlation is used to infer whether there is a relationship between the two variables, not whether changes in one variable cause changes in another. In other words, correlation says nothing about causality.
In our example above, for instance, employees might be more engaged because they're rewarded with higher salaries. Alternatively, higher levels of engagement might drive managers to increase their wages. Correlation says nothing about which variable impacts the other, but rather tells us whether there is a simple relationship between the variables, the direction of the relationship (positive or negative), and its strength.
Of two techniques used to perform correlation analysis, the Pearson correlation method is probably the most recognized and widely used in market and business research. Let’s take a look at what the Pearson correlation method is, and how you can use it.
The Pearson product moment correlation coefficient can be described as a way to measure the strength of a linear relationship between two variables—which can be used to find out if there is strong association between one variable versus another.
Imagine you have two variables—such as employee engagement and employee salaries—plotted on a simple scatter plot graph. The Pearson correlation essentially tries to utilize a scatter plot by drawing a line through the data in order to find out whether the two compariables are covary with one another and to what extent. That is, Pearson correlation coefficient identifies whether:
Insight into this relationship is a first step in understanding how variables of interest might relate to one another, and could also prompt further causal investigation.
The Pearson correlation coefficient test compares the mean value of the product of the standard scores of matched pairs of observations. Once performed, it yields a number that can range from -1 to +1. Positive figures are indicative of a positive correlation between the two variables, while negative values indicate a negative relationship. Furthermore, the value of r represents the strength of the relationship. A Pearson’s r that is near the value of 1 is suggestive of a stronger relationship between the two variables. As a rule of thumb, the following values can be used to determine the strength of the relationship:
For example, imagine that you’ve developed some marketing concepts that you’ve begun testing with some potential customers. For each concept, you’re interested in learning whether evaluations of the appeal of the concept are associated with stronger intent to purchase.
Comparisons of Concepts A, B and C yield Pearson correlation coefficients of .3 .6 and .9 respectively. Based on these three figures, you can infer the following:
Using these inferences, you might decide that Concept C is the most appropriate concept to employ in your next marketing campaign. However, first, you’ll need to determine whether the correlation you’ve observed is statistically significant. Let’s look at the formula used to determine Pearson’s r in more detail, and how you can combine this formula with a t test to determine significance.
The Pearson correlation coefficient coefficient (r) is calculated using the following expression:
Where xi represents the values of the x variable in a sample, x-bar indicates the mean of the values of the x variable, yi indicates the values of the y variable, and y-bar indicates the mean of the values of the y-variable. S indicates the sum of squares of the x and y variables respectively, and n is the number of observations of x and y variables.
After an r value is produced, the next step is to determine whether the value is of statistical significance. The importance of this step cannot be overstated. It is possible to observe two variables that seem to be related to one another, but the relationship is in fact meaningless.
For example, you might observe a relationship between concept appeal and intended purchase frequency, leading you to believe that the concept that has the greatest appeal will lead people to spend more. However, if this relationship occurred merely through chance, your marketing campaign might turn out to be an expensive waste of cash.
Statistical significance indicates that we are confident of a relationship between the two variables; in other words, that the result did not occur by chance.
A t test is used to establish if the Pearson’s r statistic differs significantly from zero. Statistical significance (indicated by the probability, or p) indicates whether the observer can be confident of a relationship between the two variables at different levels. For instance, a p value of .05 indicates that there is only a 5% chance that that relationship occurred by chance, while a p value of .10 indicates that there is a 10% chance that the observed correlation is a chance event.
The t statistic always has the same sign (+ or -) as the r value and is calculated as follows:
t = r* √((n-2) / (1 - r*r))
Once the t-value is calculated, it can be compared with the critical value from a standard t-table at the appropriate degrees of freedom (n-1) and the level of confidence (p value) you wish to maintain in order to determine the significance, and therefore the extent to which the correlation you have observed is meaningful.
Practical applications of the Pearson correlation coefficient
The Pearson correlation is a relatively simple equation, but its uses are myriad. You can use apply this technique to answer research questions such as:
The Spearman’s test is a non-parametric version of the parametric Pearson bivariate correlation coefficient. What does this mean? Well, parametric tests and non-parametric tests are distinguished on the basis of assumptions that they make about the nature of the data to be analyzed. A parametric statistical test is a test that makes clear assumptions about the defining properties, or parameters, of the dataset. For a dataset to be appropriate for the parametric version of correlational analysis (i.e. the Pearson correlation), the following assumptions must be met:
If either of these assumptions are violated, you should use the nonparametric version of the correlation technique, known as Spearman’s correlation, Spearman’s rank-order test, or Spearman’s rho.
What is the Spearman correlation coefficient?
Like the Pearson test, the Spearman correlation test examines whether two variables are correlated with one another or not. The Spearman’s test can be used to analyse ordinal level, as well as continuous level data, because it uses ranks instead of assumptions of normality. This makes the Spearman correlation great for 3, 5, and 7-point likert scale questions or ordinal survey questions.
The Spearman’s test is therefore useful where the basic assumptions of linearity and continuous variables necessary to perform a Pearson's bivariate correlation analysis have not been met.
The Spearman’s rank-order tests determines the strength and direction of the monotonic relationship between two variables measured at ordinal, interval or ratio level. As with the Pearson equivalent, the test will yield a figure of between -1 and +1, and the closer the figure is to 1, the stronger the monotonic relationship. As a rule of thumb, you can use the following figures to determine the effect size:
In addition, probability values should be used to determine statistical significance.
An example is the best way to understand how to calculate a Spearman’s correlation. Imagine you’ve gathered some data on evaluations of a market concept, and the average price a consumer is willing to pay. Take the following steps:
Finally, square the differences (d2) and then sum them.
|Concept||Concept evaluation (0-10)||Concept evaluation ranking||Price willing to pay||Price willing to pay ranking||Difference between rankings (d)||d2|
Now, you have all the data you need to calculate Spearman’s rank, using the following formula:
In our example, we would first multiply the sum of the d2 values (6) by 6 (i.e. 36). To address the denominator, we would raise the number of observations (concepts) to the power of 3 and then subtract the number of observations (i.e. 43-4 = 60). We can then calculate Spearman’s rho as 1-36/60= -.058. This indicates a moderate, negative monotonic correlation between concept evaluation and the price that consumers are willing to pay. A difficult one to interpret!
Practical applications of the Spearman’s correlation coefficient
The types of research questions that can be addressed through the Spearman correlation method are similar to those addressable through a Pearson analysis. Remember, however, that the main difference is that data can be ordinal in nature, and the relationship should be monotonic. For example, you could use the Spearman correlation coefficient to answer questions like:
Confused about when to use the Pearson correlation and when to use the Spearman’s correlation coefficient? Remember that Spearman's correlation determines the strength and direction of the monotonic relationship between your two variables rather than the strength and direction of the linear relationship between your two variables, which is what Pearson's correlation determines. Linear relationships are straight line relationships. Monotonic relationships differ from linear relationships in that the two variables might converge, but not at a constant rate. There are three types of monotonic functions:
This means that as the x variable increases, the y variable never decreases.
This means that as the x variable increases, the y variable never increases.
This means that as the x variable increases, the y variable sometimes decreases and sometimes increases.
Collect market research data by sending your survey to a representative sample
Get help with your market research project by working with our expert research team
Test creative or product concepts using an automated approach to analysis and reporting