๐Ÿ”—

Correlation

Correlation measures the strength and direction of the linear relationship between two quantitative variables. The Pearson correlation coefficient (r) ranges from -1 to +1, while the Spearman rank correlation captures monotonic relationships. Understanding the distinction between correlation and causation is one of the most important lessons in statistics.

Solve Correlation Problems with AI

Snap a photo of any correlation problem and get instant step-by-step solutions.

Download StatsIQ

Key Concepts

1
Pearson correlation coefficient (r)
2
Spearman rank correlation (rho)
3
Interpreting strength and direction of r
4
Correlation vs. causation
5
Coefficient of determination (r-squared)
6
Outlier effects on correlation
7
Lurking and confounding variables
8
Scatterplots and visual assessment

Study Tips

  • โœ“Always make a scatterplot before computing a correlation coefficient. Patterns like nonlinear relationships, clusters, or influential outliers can drastically affect r without being obvious from the number alone.
  • โœ“Remember that r only measures linear association. A dataset with a strong curved relationship can have r near 0. If the scatterplot is curved, consider a transformation or a different model.
  • โœ“Practice articulating why correlation does not imply causation. Be ready to suggest lurking variables or reverse causation for any given example.
  • โœ“Understand that squaring the correlation gives R-squared: if r = 0.70, then R-squared = 0.49, meaning about 49% of the variability in one variable is explained by the linear relationship with the other.

Common Mistakes to Avoid

The biggest and most consequential mistake is assuming that a strong correlation implies a causal relationship. There may be confounding variables, reverse causation, or coincidental association. Students also sometimes apply Pearson's r to nonlinear relationships or ordinal data, where Spearman's rho would be more appropriate. Another error is not recognizing that a single outlier can dramatically inflate or deflate the correlation coefficient, and removing it can change the conclusion entirely.

Correlation FAQs

Common questions about correlation

Correlation only tells you that two variables move together; it cannot establish that one causes the other. There are several reasons: a third confounding variable might drive both (e.g., ice cream sales and drownings both increase in summer because of heat), the direction of causality might be reversed, or the association might be purely coincidental. Establishing causation requires controlled experiments or advanced causal inference methods that go beyond simple correlation.

Use Spearman's rank correlation when your data are ordinal (e.g., survey rankings), when the relationship is monotonic but not necessarily linear, or when your data contain outliers that would distort Pearson's r. Spearman's rho is based on the ranks of the data rather than the raw values, making it more robust to skewness and outliers. If your data are roughly bivariate normal and the relationship is linear, Pearson's r is preferred because it is more statistically powerful.

Related Topics

All Statistics Topics