K-Means, hierarchical, dendrograms
Finding hidden groups in unlabelled data.
Prof. Xuhu Wan
ISOM, HKUST Business School · Wan Academy · 2026 Edition
Collect → Standardise → Choose K → Run → Interpret
Clustering is unsupervised learning: no target column, no “right answer”. The algorithm discovers structure in the features alone. Business uses include customer segmentation, anomaly detection, document grouping, and exploratory data analysis.
Important
The single biggest mistake is skipping standardisation. If you cluster Age (20–70) and Income ($15K–$140K) without standardising, the income axis dominates the Euclidean distance by a factor of ~1000.
Repeatedly: (1) assign each point to its nearest centroid, then (2) move each centroid to its cluster’s mean. Stop when centroids stop moving.
The elbow is a heuristic, not a proof. It tells you where adding clusters stops giving meaningful improvement.
The full tree (dendrogram) lets you read off any K after the fact by cutting horizontally.
Note
Ward linkage picks the merge that produces the smallest increase in within-cluster variance — the most common choice in practice. It tends to produce compact, balanced clusters.
| Method | Use when |
|---|---|
| K-Means | You know K · fast on large N · roughly spherical clusters |
| Hierarchical (Ward) | Small-to-mid N · don’t know K up front · want a tree |
| Always | Standardise features first |
| Choose K | Elbow + business knowledge |
Full Forbes financial and customer-segmentation case studies in the book — Chapter 4.
This concludes the course. Capstone projects use everything from Chapters 1–4 together.
Prof. Xuhu Wan · HKUST ISOM · Introduction to Business Analytics