๐Ÿ”€
advancedintermediate25 min

Non-Parametric Tests: When to Use Mann-Whitney, Wilcoxon, and Kruskal-Wallis

A practical guide to the three most common non-parametric tests covering when parametric assumptions fail, how rank-based tests work, and step-by-step procedures for Mann-Whitney U (two independent groups), Wilcoxon signed-rank (paired data), and Kruskal-Wallis (three or more groups).

What You'll Learn

  • โœ“Identify when non-parametric tests are appropriate instead of parametric alternatives
  • โœ“Perform and interpret a Mann-Whitney U test for comparing two independent groups
  • โœ“Perform and interpret a Wilcoxon signed-rank test for paired or repeated-measures data
  • โœ“Perform and interpret a Kruskal-Wallis test for comparing three or more independent groups

1. When to Go Non-Parametric

Parametric tests (t-tests, ANOVA) assume that your data comes from a normal distribution with equal variances. When those assumptions hold, parametric tests are more powerful โ€” they are better at detecting real effects. But when the assumptions are violated, parametric results can be misleading or outright wrong. Use non-parametric tests when: your data is clearly non-normal (heavily skewed, bimodal, or with extreme outliers) and sample size is too small for the Central Limit Theorem to rescue you (roughly n < 30 per group), your data is ordinal (ranked data like survey responses on a 1-5 scale โ€” you can say 4 is higher than 3, but not that the difference between 3 and 4 is the same as between 4 and 5), your sample size is very small (under 15 per group), or your data has outliers that dramatically influence the mean but not the median. The trade-off is straightforward: non-parametric tests make fewer assumptions, which makes them more broadly applicable. But they have less statistical power โ€” roughly 95% as powerful as their parametric equivalents under normal conditions. In practice, this means you need a slightly larger sample to detect the same effect. For most applications, the power difference is negligible. The bigger risk is using a parametric test when assumptions are grossly violated and getting a misleading result.

Key Points

  • โ€ขNon-parametric tests do not assume normality โ€” use when data is skewed, ordinal, or has extreme outliers
  • โ€ขThey have ~95% of the power of parametric tests under normal conditions โ€” the power loss is usually negligible
  • โ€ขSmall samples (n < 15-30) cannot rely on CLT to normalize sampling distributions โ€” non-parametric is safer
  • โ€ขThe mapping: t-test -> Mann-Whitney (independent) or Wilcoxon (paired). ANOVA -> Kruskal-Wallis.

2. Mann-Whitney U Test: Two Independent Groups

The Mann-Whitney U test (also called the Wilcoxon rank-sum test โ€” confusingly named similarly to the paired test) compares two independent groups when you cannot assume normality. It is the non-parametric equivalent of the independent samples t-test. The test works by ranking all observations from both groups together, then checking whether the ranks are distributed evenly between the groups. If one group has systematically higher ranks, the groups differ. Procedure: (1) Combine all observations from both groups and rank them from smallest to largest. Assign average ranks for ties. (2) Sum the ranks for each group separately (R1 and R2). (3) Calculate U1 = n1*n2 + n1(n1+1)/2 - R1 and U2 = n1*n2 + n2(n2+1)/2 - R2. (4) The test statistic is the smaller of U1 and U2 (or for software, the larger โ€” check your software documentation). (5) Compare to the critical value or use the p-value from software. Worked example: Group A test scores: 15, 22, 18, 25. Group B: 30, 28, 35, 32. Combined ranked: 15(1), 18(2), 22(3), 25(4), 28(5), 30(6), 32(7), 35(8). R_A = 1+2+3+4 = 10. R_B = 5+6+7+8 = 26. U_A = 4*4 + 4*5/2 - 10 = 16 + 10 - 10 = 16. U_B = 4*4 + 4*5/2 - 26 = 0. U = min(16,0) = 0. With n1=n2=4, a U of 0 has p < 0.05 (check the Mann-Whitney table). The groups differ significantly โ€” Group B scored higher. Interpretation: Mann-Whitney tests whether the distributions of the two groups differ, not specifically whether the means differ. With similar distribution shapes, it effectively tests whether one group tends to have higher values than the other. StatsIQ generates Mann-Whitney practice problems with automatic ranking, U calculation, and interpretation guidance.

Key Points

  • โ€ขMann-Whitney = non-parametric equivalent of independent t-test. Compares two unrelated groups.
  • โ€ขThe test ranks all observations together, then checks whether ranks are evenly distributed between groups
  • โ€ขNull hypothesis: the two groups come from the same distribution (no difference)
  • โ€ขWorks with ordinal data, skewed data, and small samples where t-test assumptions fail

3. Wilcoxon Signed-Rank Test: Paired or Repeated Data

The Wilcoxon signed-rank test is the non-parametric equivalent of the paired t-test. It compares two related measurements โ€” before/after scores, matched pairs, or repeated measures on the same subjects. The test works by calculating the difference for each pair, ranking the absolute differences, and then comparing the sum of positive ranks to the sum of negative ranks. If the treatment has no effect, positive and negative differences should be roughly equal in magnitude. Procedure: (1) Calculate the difference (d = after - before) for each pair. (2) Discard any pairs where d = 0. (3) Rank the absolute values of the remaining differences, smallest to largest. (4) Assign each rank the sign of its difference (positive or negative). (5) Sum the positive ranks (W+) and negative ranks (W-). (6) The test statistic W is the smaller of W+ and W-. (7) Compare to the critical value table or use software. Example: Five patients measured before and after treatment. Differences: +5, -2, +8, +3, +6. Absolute values ranked: 2(1), 3(2), 5(3), 6(4), 8(5). Signed ranks: -1, +2, +3, +4, +5. W+ = 14, W- = 1. W = min(14,1) = 1. For n=5, W=1 has p < 0.05 (one-tailed). The treatment produced a significant improvement. When to use Wilcoxon over paired t-test: when the differences are clearly non-normal (skewed, with outliers), when n is small (under 20 pairs), or when the data is ordinal (patient-rated improvement on a 1-10 scale). For large samples with roughly normal differences, the paired t-test is slightly more powerful and gives equivalent results.

Key Points

  • โ€ขWilcoxon signed-rank = non-parametric equivalent of paired t-test. Compares two related measurements.
  • โ€ขRanks the absolute differences, then compares the sum of positive ranks vs negative ranks
  • โ€ขUse when paired differences are non-normal, ordinal, or sample is very small (< 20 pairs)
  • โ€ขDiscards zero differences โ€” only non-zero differences contribute to the test statistic

4. Kruskal-Wallis Test: Three or More Groups

The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. It compares three or more independent groups when the normality assumption is violated. The logic extends the Mann-Whitney approach: rank all observations across all groups together, then check whether the rank distributions differ between groups. If one or more groups have systematically higher or lower ranks, the test detects the difference. The test statistic H follows a chi-square distribution with k-1 degrees of freedom (where k is the number of groups). H = [12 / N(N+1)] * sum(R_i^2 / n_i) - 3(N+1), where N is the total sample size, R_i is the sum of ranks in group i, and n_i is the size of group i. If H exceeds the chi-square critical value at your chosen alpha, at least one group differs from the others. But like ANOVA, a significant Kruskal-Wallis does not tell you which groups differ โ€” you need post-hoc pairwise comparisons (typically Dunn's test with Bonferroni or Holm correction) to identify the specific group differences. When to use Kruskal-Wallis instead of ANOVA: when data is ordinal (e.g., pain ratings across three treatment groups), when distributions are clearly non-normal across groups, or when sample sizes are small and unequal. For large samples with approximately normal distributions, one-way ANOVA is more powerful and gives equivalent conclusions. StatsIQ generates Kruskal-Wallis practice problems including ranking across groups, H calculation, and Dunn's post-hoc comparison interpretation.

Key Points

  • โ€ขKruskal-Wallis = non-parametric equivalent of one-way ANOVA. Compares 3+ independent groups.
  • โ€ขH statistic follows chi-square distribution with k-1 degrees of freedom
  • โ€ขSignificant H means at least one group differs โ€” use Dunn's test for pairwise post-hoc comparisons
  • โ€ขBest for ordinal data, non-normal distributions, or small unequal group sizes

Key Takeaways

  • โ˜…Non-parametric tests rank data instead of using raw values โ€” this makes them robust to outliers and non-normality
  • โ˜…t-test equivalent: Mann-Whitney (independent) or Wilcoxon signed-rank (paired). ANOVA equivalent: Kruskal-Wallis.
  • โ˜…Non-parametric tests have ~95% power of parametric equivalents under normal conditions โ€” minimal practical loss
  • โ˜…Kruskal-Wallis requires post-hoc tests (Dunn's) to identify which specific groups differ โ€” same logic as ANOVA
  • โ˜…For ordinal data (survey scales, ratings), non-parametric tests are always more appropriate than parametric

Practice Questions

1. You have pain ratings (1-10 scale) from two treatment groups, each with 12 patients. The data is ordinal and right-skewed. Which test should you use?
Mann-Whitney U test. Two independent groups, ordinal data (pain ratings), and non-normal distribution (right-skewed). The Mann-Whitney is appropriate because: (1) the data is ordinal (a 1-10 scale is not truly continuous), (2) the distribution is non-normal, and (3) the parametric alternative (independent t-test) assumes interval data and normality, which are both violated.
2. Three different fertilizers are tested on 8 plants each. Growth measurements are normally distributed. Should you use Kruskal-Wallis or one-way ANOVA?
One-way ANOVA. Since the data is normally distributed (assumption is met) and the outcome is continuous (growth measurements), the parametric test (ANOVA) is more powerful and appropriate. Kruskal-Wallis would also give valid results but with slightly less statistical power. Use non-parametric only when parametric assumptions are violated.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

You can, and the results will be valid. But you sacrifice about 5% statistical power compared to parametric tests when the parametric assumptions actually hold. For small samples where detecting a real effect is already hard, that 5% power loss could mean the difference between significant and non-significant. When assumptions are met, use parametric. When they are violated, use non-parametric. When you are unsure, run both and compare โ€” if they agree, the conclusion is robust.

Yes. StatsIQ generates non-parametric test problems including Mann-Whitney U, Wilcoxon signed-rank, and Kruskal-Wallis with step-by-step ranking procedures, test statistic calculation, and result interpretation.

More Study Guides