Correlation, CAPM, Fama-French, variable selection
Correlation, CAPM, Fama-French factors, train/test, variable selection.
Prof. Xuhu Wan
ISOM, HKUST Business School · Wan Academy · 2026 Edition
r ≈ 0.74 between NVDA and SPY daily returns.
Do not say “74 % of NVDA’s movement is explained by SPY.” That’s wrong.
Do say “r² = 0.55, so 55 % of NVDA’s variance is linearly explained by SPY.” The remaining 45 % is idiosyncratic.
The relationship between r and r² is the most-confused fact in introductory regression.
The Capital Asset Pricing Model:
\[r_{\text{stock}} - r_f = \alpha + \beta\,(r_m - r_f) + \varepsilon\]
We subtract r_f from both sides because CAPM models excess returns.
Important
sm.add_constant(X) is required — without it, statsmodels fits a model with no intercept. This is the single most common bug for analysts moving from R or Stata.
The model.summary() table:
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
const (α) |
0.0043 | 0.001 | 4.32 | 0.000 | 0.0024 | 0.0063 |
Mkt_excess (β) |
2.221 | 0.103 | 21.65 | 0.000 | 2.020 | 2.422 |
| R² = 0.542 |
\[\text{AIC} = -2\ln L + 2k \qquad \text{BIC} = -2\ln L + k\ln n\]
Note
AIC penalty +2k is small → keeps more variables, optimised for forecasting.
BIC penalty +k ln n grows with sample size → keeps fewer variables, optimised for identifying the true model.
No criterion is simultaneously efficient and consistent — a fundamental statistical impossibility. Use AIC if you care about prediction; BIC if you care about which factors are real.
| Concept | Tool |
|---|---|
| Correlation | df.corr() |
| Regression | sm.OLS(y, sm.add_constant(X)).fit() |
| Reading output | .summary() |
| CI for β | .conf_int() |
| Prediction | .predict() / .get_prediction() |
| Variable selection | AIC / BIC / Adj R² / Mallow’s Cp |
Full treatment of CAPM, Fama-French 5-factor, residual diagnostics, and the pharmacy multiple-regression case in the book — Chapter 3.
Next: Chapter 4 — Clustering.
Prof. Xuhu Wan · HKUST ISOM · Introduction to Business Analytics