Print PDF
Are the values in one set of data related to those in another set? Do the values of one set move up or down as those of the other set rise? To answer this question we estimate the correlation between two data sets. The first step is to examine the data plotted on a scatter chart where each point represents one pair of data (X and Y value). Data that are not associated tend to form a random pattern, whereas those that are related will show a discernible pattern of association. The scatter plot below shows a completely random distribution of the points. We can reasonably state that the two data sets are not related.

Click image to enlarge
The next scatter plot shows that as the values of data set on the X-axis rise so do most of those on the Y-axis. Most of the points fit quite close to an imaginary line drawn through the points, although a few deviate on either side. This chart indicates a close, positive correlation between the two data sets – the data move in the same direction. A negative correlation would be indicated where the points on the Y-axis decreased as those on the X-axis increased.

Click image to enlarge
Statistics provides us with a means of measuring how close and how significant is the correlation between the data sets – the correlation coefficient usually represented by the symbol r or r. The possible values of r range between -1 and 1 – nearer to zero indicates less correlation, nearer to ± 1 indicates stronger positive or negative correlation. The value of r when squared (R2) is known as the coefficient of association; again, the strength of association is indicated by the nearer the value approaches 1. The R2 value for the data in the first chart is low at 0.0018, whereas that for the data in the last chart is 0.87.
We cannot assume that there is a cause and effect relationship for these correlations. However, in those cases where we know that the value of X actually causes the response Y (for example, if Y were the response to the stimulus X), then we can further state that R2 represents that proportion of the response Y that is actually due to X. In the case for the right chart, 87% of the value of Y is actually due to X; that also means that 13% of the Y effect is due to a cause other than X, e.g., a random effect or an effect of a variable which is not measured.
Back to Basic Statistics Next