Pearson Correlation Coefficient
r = Σ(xᵢ - x̄)(yᵢ - ȳ) / √[Σ(xᵢ - x̄)² Σ(yᵢ - ȳ)²]
The Pearson correlation coefficient measures the strength and direction of the linear relationship between two quantitative variables. It ranges from -1 (perfect negative linear relationship) through 0 (no linear relationship) to +1 (perfect positive linear relationship).
Variables
The strength and direction of the linear association, between -1 and 1
The individual data pairs for the two variables
The means of the x and y variables, respectively
Example Calculation
Scenario
Five students' study hours (x) and exam scores (y) are: (2,65), (4,78), (5,82), (6,90), (8,95). Calculate the Pearson correlation.
Given Data
Calculation
Σ(xᵢ - x̄)² = 9+1+0+1+9 = 20; Σ(yᵢ - ȳ)² = 289+16+0+64+169 = 538; r = 102 / √(20 × 538) = 102 / √10760 = 102 / 103.73
Result
r = 0.983
Interpretation
The correlation of 0.983 indicates a very strong positive linear relationship between study hours and exam scores. As study hours increase, exam scores tend to increase proportionally.
When to Use This Formula
- ✓Assessing the strength and direction of a linear relationship between two quantitative variables
- ✓Determining whether a linear regression model is appropriate before fitting it
- ✓Comparing the strength of association across different pairs of variables
Common Mistakes
- ✗Interpreting correlation as causation without additional evidence
- ✗Using Pearson correlation for nonlinear relationships where it will understate the true association
- ✗Ignoring the effect of outliers, which can dramatically inflate or deflate r
- ✗Applying Pearson correlation to ordinal data where Spearman rank correlation would be more appropriate
Calculate This Formula Instantly
Snap a photo of any problem and get step-by-step solutions.
Download StatsIQFAQs
Common questions about this formula
No. Correlation measures association, not causation. Two variables can be strongly correlated due to a third confounding variable or by coincidence. Establishing causation requires controlled experiments or rigorous causal inference methods.
As a general guideline, |r| > 0.7 is often considered a strong correlation, 0.3 < |r| < 0.7 is moderate, and |r| < 0.3 is weak. However, context matters greatly. In some fields, r = 0.3 may be practically significant, while in others r = 0.9 may be expected.