🔗regression

Pearson Correlation Coefficient

r = Σ(xᵢ - x̄)(yᵢ - ȳ) / √[Σ(xᵢ - x̄)² Σ(yᵢ - ȳ)²]

The Pearson correlation coefficient measures the strength and direction of the linear relationship between two quantitative variables. It ranges from -1 (perfect negative linear relationship) through 0 (no linear relationship) to +1 (perfect positive linear relationship).

Variables

r=Correlation Coefficient

The strength and direction of the linear association, between -1 and 1

xᵢ, yᵢ=Paired Observations

The individual data pairs for the two variables

x̄, ȳ=Sample Means

The means of the x and y variables, respectively

Example Calculation

Scenario

Five students' study hours (x) and exam scores (y) are: (2,65), (4,78), (5,82), (6,90), (8,95). Calculate the Pearson correlation.

Given Data

:(2+4+5+6+8)/5 = 5.0
ȳ:(65+78+82+90+95)/5 = 82.0
Σ(xᵢ - x̄)(yᵢ - ȳ):(-3)(-17)+(-1)(-4)+(0)(0)+(1)(8)+(3)(13) = 51+4+0+8+39 = 102

Calculation

Σ(xᵢ - x̄)² = 9+1+0+1+9 = 20; Σ(yᵢ - ȳ)² = 289+16+0+64+169 = 538; r = 102 / √(20 × 538) = 102 / √10760 = 102 / 103.73

Result

r = 0.983

Interpretation

The correlation of 0.983 indicates a very strong positive linear relationship between study hours and exam scores. As study hours increase, exam scores tend to increase proportionally.

When to Use This Formula

  • Assessing the strength and direction of a linear relationship between two quantitative variables
  • Determining whether a linear regression model is appropriate before fitting it
  • Comparing the strength of association across different pairs of variables

Common Mistakes

  • Interpreting correlation as causation without additional evidence
  • Using Pearson correlation for nonlinear relationships where it will understate the true association
  • Ignoring the effect of outliers, which can dramatically inflate or deflate r
  • Applying Pearson correlation to ordinal data where Spearman rank correlation would be more appropriate

Calculate This Formula Instantly

Snap a photo of any problem and get step-by-step solutions.

Download StatsIQ

FAQs

Common questions about this formula

No. Correlation measures association, not causation. Two variables can be strongly correlated due to a third confounding variable or by coincidence. Establishing causation requires controlled experiments or rigorous causal inference methods.

As a general guideline, |r| > 0.7 is often considered a strong correlation, 0.3 < |r| < 0.7 is moderate, and |r| < 0.3 is weak. However, context matters greatly. In some fields, r = 0.3 may be practically significant, while in others r = 0.9 may be expected.

More Formulas