Chi-Square Statistic
ฯยฒ = ฮฃ(O - E)ยฒ / E
The chi-square statistic measures the discrepancy between observed and expected frequencies. It is used in goodness-of-fit tests (does data follow a hypothesized distribution?) and tests of independence (are two categorical variables related?). Larger values of ฯยฒ indicate greater deviation from what was expected.
Variables
The test statistic measuring overall discrepancy between observed and expected counts
The actual count in each category from the data
The count expected under the null hypothesis
Example Calculation
Scenario
A die is rolled 60 times. The observed frequencies for faces 1 through 6 are: 8, 12, 10, 14, 7, 9. Test if the die is fair.
Given Data
Calculation
ฯยฒ = (8-10)ยฒ/10 + (12-10)ยฒ/10 + (10-10)ยฒ/10 + (14-10)ยฒ/10 + (7-10)ยฒ/10 + (9-10)ยฒ/10 = 0.4 + 0.4 + 0 + 1.6 + 0.9 + 0.1
Result
ฯยฒ = 3.4 with df = 5
Interpretation
With ฯยฒ = 3.4 and 5 degrees of freedom, the p-value is approximately 0.64. Since p > 0.05, we fail to reject the null hypothesis. There is no significant evidence that the die is unfair.
When to Use This Formula
- โTesting whether observed categorical data fits an expected distribution (goodness-of-fit)
- โTesting whether two categorical variables are independent (test of independence)
- โAnalyzing survey responses across categories
- โComparing proportions across multiple groups (test of homogeneity)
Common Mistakes
- โUsing raw proportions or percentages instead of counts in the formula
- โApplying the chi-square test when expected frequencies are too small (generally E < 5)
- โConfusing degrees of freedom for goodness-of-fit (k - 1) versus independence ((r-1)(c-1))
- โInterpreting a large chi-square as the direction of the association without examining the residuals
Calculate This Formula Instantly
Snap a photo of any problem and get step-by-step solutions.
Download StatsIQFAQs
Common questions about this formula
For each cell in a contingency table, the expected frequency is E = (row total x column total) / grand total. This formula assumes the two variables are independent, which is the null hypothesis being tested.
The data must consist of independent observations, every observation must fall into exactly one category, and the expected frequency in each cell should be at least 5. When expected frequencies are too small, consider combining categories or using Fisher's exact test.