Lists, Series, NumPy, SciPy, Matplotlib
Chapter 1 · Introduction to Business Analytics
Lists · pandas Series · NumPy · SciPy · Matplotlib
Prof. Xuhu Wan
ISOM, HKUST Business School · Wan Academy · 2026 Edition
The Python data-science stack evolved in layers, each solving a problem the previous one could not:
By the end of this chapter, you can build a working portfolio analyser from scratch using all four.
Tip
Why this matters. Goldman Sachs’s risk system, BlackRock’s Aladdin, every quant hedge fund’s research pipeline — all run on these same four libraries.
| Section | Concept | Tool |
|---|---|---|
| 1 | Lists & indexing | Python built-in |
| 2 | List comprehensions | Python built-in |
| 3 | pandas Series | pandas |
| 4 | Descriptive statistics | pandas |
| 5 | NumPy arrays | numpy |
| 6 | Random simulation | numpy.random |
| 7 | Plotting | matplotlib |
| 8 | Probability & VaR | scipy.stats |
Your first data container.
Note
Positive indices count from the start (0, 1, 2…). Negative indices count from the end (-1, -2…). Slicing is half-open: [1:3] returns positions 1 and 2.
Doing more with one line.
Labelled 1D arrays with built-in statistics.
Note
A pandas Series = a NumPy array + an index (the labels) + a name (the column header). Each row keeps its label through every operation.
Important
.describe() gives the 8-number summary in one line. It is the single most useful first-look method for any quantitative column.
ddof=1 vs ddof=0 Trappandas defaults to sample standard deviation (divide by n−1);
NumPy defaults to population standard deviation (divide by n).
Warning
Rule of thumb. Leave pandas’ .std() at its default — sample std (ddof=1) is what every statistics textbook and Excel’s STDEV.S use.
Vectorised math at C speed.
Important
A Python list * operator means concatenate. A NumPy array * operator means element-wise multiplication. Mistaking them is the most common bug when moving from lists to NumPy.
Note
NumPy delegates math to BLAS/LAPACK — the same compiled C/Fortran libraries that run MATLAB and Bloomberg risk engines. The Python interpreter never touches the inner loop.
A one-year simulation in three lines: draw 252 daily returns from N(0.08%, 1.5%), compound them, see the path.
Tip
Replace the seed(42) line and re-run to see a different price path. This is one Monte-Carlo trial — risk teams run 10,000 of them every morning.
Distributions, CDFs, and Value at Risk.
CDF: value → probability
\[P(X < x) = \text{cdf}(x)\]
84.1 % of outcomes from N(8, 2²) fall below 10.
The PPF is the mathematical inverse of the CDF. Together they let you answer every probability question about a normal random variable.
Important
95 % VaR = “with 95 % confidence, the portfolio is worth at least this much.” A single number that every bank reports daily under Basel II/III.
Portfolio analyser in 20 lines.
Tip
Twenty lines combine everything you’ve seen: pandas Series of weights, NumPy random sampling, matrix multiplication, statistical aggregation, and a SciPy VaR computation. The whole pipeline is the foundation of every quant fund.
| Concept | Tool | Use |
|---|---|---|
| Ordered collection | list |
Tickers, transactions |
| Concise transform | comprehension | Buy/Sell labels |
| Labelled 1D | pd.Series |
Time series |
| Fast vector math | np.array |
Monte Carlo, returns |
| Probability | scipy.stats.norm |
VaR, hypothesis tests |
| Charts | matplotlib |
Histograms, lines |
Next: Chapter 2 — DataFrames (Series side-by-side; the actual workhorse of every analytical pipeline).
std() to n−1 while NumPy defaults to n? What would happen if a bank computed VaR using n instead of n−1 on a small sample?port = returns @ weights. What is the shape of each operand? What does the @ operator produce here?Prof. Xuhu Wan · HKUST ISOM · Introduction to Business Analytics