Introduction to Logistic Regression: When and Why Linear Regression Fails for Binary Outcomes
A clear introduction to logistic regression for students who understand linear regression and need to extend to binary outcomes โ covering why linear regression breaks down for yes/no predictions, how the logit transformation works, and how to interpret odds ratios.
What You'll Learn
- โExplain why linear regression is inappropriate for binary (0/1) outcome variables
- โDescribe the logit transformation and how it maps probabilities to a linear scale
- โInterpret logistic regression coefficients as log-odds and convert to odds ratios
- โRead and evaluate basic logistic regression output from statistical software
1. The Problem: Why Linear Regression Breaks Down
Linear regression predicts a continuous numeric outcome: test score, blood pressure, income. But many real-world questions have binary outcomes: did the patient survive (yes/no), did the customer buy (yes/no), did the student pass (yes/no). If you try to use linear regression on a binary outcome coded as 0 and 1, three things go wrong. First, linear regression produces predictions outside the 0-1 range. A predicted value of 1.3 or -0.2 is meaningless as a probability. You cannot have a 130% chance of passing or a -20% chance of dying. The regression line extends infinitely in both directions, but probabilities are bounded between 0 and 1. Second, the relationship between predictors and a binary outcome is not linear โ it is S-shaped (sigmoidal). Think about the relationship between study hours and passing an exam. Zero hours: almost guaranteed to fail. Five hours: better chance. Ten hours: good chance. Twenty hours: almost guaranteed to pass. The probability does not increase at a constant rate โ it increases slowly at the extremes and quickly in the middle. A straight line cannot capture this shape. Third, the residuals (errors) from a linear regression on binary data violate the normality and constant variance assumptions. The errors can only take two possible values for any given prediction (the residual when Y=1 and the residual when Y=0), creating a distinctly non-normal distribution. The inference machinery โ standard errors, confidence intervals, p-values โ becomes unreliable.
Key Points
- โขLinear regression on binary outcomes produces predictions outside the valid 0-1 probability range
- โขThe true relationship between predictors and binary outcomes is S-shaped (sigmoidal), not linear
- โขResiduals from binary outcome regression violate normality and constant variance assumptions
- โขThese three problems make linear regression unreliable for binary outcomes โ logistic regression fixes all three
2. The Solution: The Logit Transformation
Logistic regression fixes the problem by transforming the outcome. Instead of modeling the probability directly (which is bounded 0-1), it models the log-odds (which is unbounded, ranging from negative infinity to positive infinity). This transformation is called the logit. Here is how it works mathematically. Odds = p / (1-p), where p is the probability of the outcome occurring. If p = 0.75, the odds are 0.75/0.25 = 3 (three times more likely to happen than not). Log-odds (logit) = ln(odds) = ln(p / (1-p)). The logistic regression equation models this: logit(p) = b0 + b1*x1 + b2*x2 + ... The magic: log-odds is a linear function of the predictors (which lets us use regression math), but when you convert back to probability using the inverse logit function โ p = 1 / (1 + e^(-logit)) โ the result is always between 0 and 1, and the relationship is S-shaped. The transformation simultaneously solves all three problems from linear regression. Think of it like this: logistic regression fits a straight line to the log-odds, but when you translate that straight line back to probabilities, it becomes the S-shaped curve (sigmoid) that we need. The coefficients describe changes in log-odds per unit increase in x. This sounds abstract, but it becomes concrete when you convert to odds ratios.
Key Points
- โขLogit(p) = ln(p/(1-p)) transforms bounded probabilities to unbounded log-odds suitable for regression
- โขLogistic regression models: logit(p) = b0 + b1*x1 + b2*x2 + ... (linear in log-odds)
- โขThe inverse logit converts back to probability: p = 1/(1 + e^(-logit)) โ always between 0 and 1
- โขThe straight line in log-odds space becomes an S-curve in probability space โ exactly what we need
3. Interpreting Coefficients: Log-Odds and Odds Ratios
Logistic regression coefficients are in log-odds units, which are not intuitive. A coefficient of 0.47 means that a one-unit increase in x increases the log-odds of the outcome by 0.47. Nobody thinks in log-odds. The solution: exponentiate the coefficient to get the odds ratio. e^0.47 = 1.60, which means a one-unit increase in x multiplies the odds by 1.60 (a 60% increase in odds). Worked example: A logistic regression predicts exam passing (1=pass, 0=fail) from study hours. The output shows: coefficient for study hours = 0.35, intercept = -2.50. The odds ratio for study hours is e^0.35 = 1.42. Interpretation: each additional hour of studying multiplies the odds of passing by 1.42 โ a 42% increase in odds per hour. To predict the probability for a specific student who studied 10 hours: logit(p) = -2.50 + 0.35(10) = -2.50 + 3.50 = 1.00. Probability = 1/(1 + e^(-1.00)) = 1/1.368 = 0.731. A student who studies 10 hours has about a 73% predicted probability of passing. Critical distinction: an odds ratio of 1.42 does NOT mean a 42% increase in probability. Odds and probability are different things. If the baseline probability is 50% (odds = 1), a 42% increase in odds gives new odds of 1.42, which corresponds to a probability of 1.42/2.42 = 58.7% โ an 8.7 percentage point increase, not a 42 point increase. This conflation of odds ratios with probability changes is one of the most common misinterpretations in applied statistics. StatsIQ includes practice problems that take you from raw coefficients to odds ratios to predicted probabilities, building the translation skill that makes logistic regression output interpretable.
Key Points
- โขExponentiate the coefficient to get the odds ratio: OR = e^b. An OR of 1.5 means 50% higher odds per unit increase.
- โขOdds ratio is NOT the same as probability change โ this is the most common misinterpretation
- โขTo predict probability: calculate logit, then apply inverse logit: p = 1/(1 + e^(-logit))
- โขAn odds ratio of 1.0 means no effect; greater than 1.0 means higher odds; less than 1.0 means lower odds
4. Model Evaluation: How to Know If Your Logistic Regression Is Good
Unlike linear regression where R-squared tells you how well the model fits, logistic regression uses different metrics because the outcome is binary. The most intuitive metric is the classification accuracy: what percentage of observations does the model correctly classify? You set a threshold (usually 0.50) and predict yes if the predicted probability exceeds it, no otherwise. Then compare predictions to actual outcomes. An accuracy of 85% sounds good, but it can be misleading. If 90% of the sample is in one category, predicting that category every time gives 90% accuracy with zero useful information. The confusion matrix breaks accuracy into its components: true positives (correctly predicted yes), true negatives (correctly predicted no), false positives (predicted yes, actually no), and false negatives (predicted no, actually yes). Sensitivity (true positive rate) and specificity (true negative rate) are more informative than overall accuracy because they tell you how well the model performs for each category separately. The AUC (Area Under the ROC Curve) is the single best summary metric for logistic regression. It measures the model's ability to distinguish between the two categories across all possible thresholds. AUC = 0.5 means the model is no better than flipping a coin. AUC = 1.0 means perfect discrimination. In practice, AUC > 0.7 is acceptable, > 0.8 is good, and > 0.9 is excellent. For model fit, logistic regression uses deviance (analogous to residual sum of squares in linear regression) and pseudo R-squared values (McFadden, Nagelkerke) that approximate the concept of explained variation. None of these pseudo R-squared values are as interpretable as linear regression R-squared, so do not try to interpret them the same way.
Key Points
- โขClassification accuracy can be misleading with imbalanced classes โ always check the confusion matrix
- โขSensitivity (true positive rate) and specificity (true negative rate) are more informative than overall accuracy
- โขAUC is the best single summary: 0.5 = random guessing, 0.7+ = acceptable, 0.8+ = good, 0.9+ = excellent
- โขPseudo R-squared values exist but are not directly comparable to linear regression R-squared โ interpret cautiously
Key Takeaways
- โ Linear regression on binary outcomes produces impossible predictions (outside 0-1) and violates regression assumptions
- โ Logistic regression models log-odds as a linear function and converts back using the sigmoid: p = 1/(1 + e^(-z))
- โ Exponentiate coefficients to get odds ratios: OR = e^b. An OR of 2.0 means doubled odds per unit increase.
- โ Odds ratios are NOT probability changes โ this is the most common interpretation error
- โ AUC (Area Under the ROC Curve) is the preferred single metric for logistic regression model quality
Practice Questions
1. A logistic regression has intercept = -3.0 and coefficient for x = 0.5. What is the predicted probability when x = 8?
2. A coefficient is 0.693. What is the odds ratio, and what does it mean?
FAQs
Common questions about this topic
Use linear regression when the outcome is continuous (test score, income, temperature). Use logistic regression when the outcome is binary or categorical (pass/fail, buy/not buy, survive/die). If your outcome variable has only two possible values, logistic regression is almost always the correct choice.
Yes. StatsIQ includes logistic regression problems covering model interpretation, odds ratio calculation, probability prediction, and model evaluation using confusion matrices and AUC. Practice builds the interpretation fluency needed for exams and real-world data analysis.