Chi-Square Distribution
The chi-square (χ²) distribution is a continuous probability distribution that arises as the sum of squares of independent standard normal random variables. It is right-skewed and takes only non-negative values. The chi-square distribution is fundamental to statistical inference, appearing in goodness-of-fit tests, tests of independence in contingency tables, and confidence intervals for population variances. The shape is determined entirely by its degrees of freedom parameter.
Formula
f(x) = [x^(k/2 - 1) · e^(-x/2)] / [2^(k/2) · Γ(k/2)], for x ≥ 0
Mean (Expected Value)
k
Variance
2k
Parameters
The number of independent standard normal random variables being squared and summed. Must be a positive integer (k ≥ 1). Determines the shape, mean, and variance of the distribution.
Key Properties
- •If Z₁, Z₂, ..., Z_k are independent standard normal variables, then Z₁² + Z₂² + ... + Z_k² ~ χ²(k)
- •Always non-negative (x ≥ 0) and right-skewed, but becomes more symmetric as k increases
- •The sum of independent chi-square variables is also chi-square: if X ~ χ²(k₁) and Y ~ χ²(k₂), then X + Y ~ χ²(k₁ + k₂)
- •For large k, the chi-square distribution is approximately normal with mean k and variance 2k
- •The sample variance S² from a normal population satisfies (n - 1)S²/σ² ~ χ²(n - 1)
Example
A die is rolled 60 times with results: 1 appeared 8 times, 2 appeared 12 times, 3 appeared 7 times, 4 appeared 15 times, 5 appeared 9 times, 6 appeared 9 times. Test at the 5% significance level whether the die is fair.
Expected frequency for each face = 60/6 = 10. The chi-square statistic: χ² = ∑(O - E)²/E = (8-10)²/10 + (12-10)²/10 + (7-10)²/10 + (15-10)²/10 + (9-10)²/10 + (9-10)²/10 = 0.4 + 0.4 + 0.9 + 2.5 + 0.1 + 0.1 = 4.4. Degrees of freedom = 6 - 1 = 5. Critical value χ²(0.05, 5) = 11.07.
Result: χ² = 4.4, which is less than the critical value of 11.07, so we fail to reject H₀
At the 5% significance level, there is insufficient evidence to conclude the die is unfair. The observed frequencies do not differ significantly from what we would expect from a fair die. The p-value is approximately 0.49, well above 0.05.
When to Use
- ✓When performing a goodness-of-fit test to determine if observed frequencies match expected frequencies
- ✓When testing for independence between two categorical variables using a contingency table
- ✓When constructing confidence intervals for a population variance from a normally distributed population
- ✓When testing homogeneity of proportions across multiple populations
Common Mistakes
- ✗Using the chi-square test when expected cell frequencies are too small. The general rule is that all expected frequencies should be at least 5 for the approximation to be valid.
- ✗Forgetting that the chi-square test is always a right-tailed test for goodness-of-fit and independence. Larger χ² values indicate more deviation from the null hypothesis.
- ✗Using raw data instead of frequencies. The chi-square goodness-of-fit test requires counts (observed and expected frequencies), not individual data values.
- ✗Getting degrees of freedom wrong: for goodness-of-fit it is (categories - 1); for independence it is (rows - 1)(columns - 1).
Need Help with Distribution Problems?
Snap a photo of any distribution problem for instant step-by-step solutions.
Download StatsIQFAQs
Common questions about Chi-Square Distribution
The goodness-of-fit test examines whether a single categorical variable follows a specified distribution (e.g., are die outcomes uniform?). It uses one-way frequency tables with df = categories - 1. The test of independence examines whether two categorical variables are related (e.g., is political party associated with voting preference?). It uses two-way contingency tables with df = (rows - 1)(columns - 1). Both use the same test statistic χ² = ∑(O - E)²/E, but the expected frequencies are calculated differently.
The chi-square distribution is defined as the sum of squared standard normal random variables. Since squaring any real number produces a non-negative result, and the sum of non-negative numbers is non-negative, the chi-square random variable can never be negative. This aligns with its applications: the chi-square test statistic ∑(O - E)²/E involves squared differences, so it is always ≥ 0. A value of 0 would mean perfect agreement between observed and expected frequencies.
With few degrees of freedom (k = 1 or 2), the chi-square distribution is heavily right-skewed with most probability near zero. As k increases, the distribution shifts rightward, the peak moves away from zero, and it becomes more symmetric and bell-shaped. The mean shifts to k and the spread increases (variance = 2k). For k > 30, the chi-square distribution is well-approximated by a normal distribution with mean k and standard deviation √(2k).