Chapter 1 — Python Essentials

Lists, Series, NumPy, SciPy, Matplotlib

Prof. Xuhu Wan

Chapter 1 · Introduction to Business Analytics

Python Essentials

Lists · pandas Series · NumPy · SciPy · Matplotlib

Prof. Xuhu Wan

ISOM, HKUST Business School · Wan Academy · 2026 Edition

What This Chapter Builds

The Python data-science stack evolved in layers, each solving a problem the previous one could not:

  • NumPy (2005) — fast contiguous arrays
  • pandas (2008) — labelled tabular data
  • SciPy (2001) — statistics, optimisation, integration
  • Matplotlib (2003) — plotting

By the end of this chapter, you can build a working portfolio analyser from scratch using all four.

Tip

Why this matters. Goldman Sachs’s risk system, BlackRock’s Aladdin, every quant hedge fund’s research pipeline — all run on these same four libraries.

Roadmap

Section Concept Tool
1 Lists & indexing Python built-in
2 List comprehensions Python built-in
3 pandas Series pandas
4 Descriptive statistics pandas
5 NumPy arrays numpy
6 Random simulation numpy.random
7 Plotting matplotlib
8 Probability & VaR scipy.stats

§1 · Lists

§1 — Lists

Your first data container.

Lists: Indexing

Note

Positive indices count from the start (0, 1, 2…). Negative indices count from the end (-1, -2…). Slicing is half-open: [1:3] returns positions 1 and 2.

Indexing Visualised

§2 · List Comprehensions

§2 — List Comprehensions

Doing more with one line.

Three Patterns

Transform every item

prices = [88, 215, 134, 296, 161]
adj = [p * 1.10 for p in prices]
# [96.8, 236.5, 147.4, 325.6, 177.1]

Filter — keep some items

premium = [p for p in prices if p > 150]
# [215, 296, 161]

Conditional value — every item gets a label

prices = [45, 120, 30, 88, 200]
signals = ["Buy" if p > 50 else "Sell"
           for p in prices]
# ['Sell', 'Buy', 'Sell', 'Buy', 'Buy']

Tip

The conditional value comes before for. The filter condition comes after for. Don’t confuse them.

§3 · pandas Series

§3 — pandas Series

Labelled 1D arrays with built-in statistics.

A Series is a Labelled List

Note

A pandas Series = a NumPy array + an index (the labels) + a name (the column header). Each row keeps its label through every operation.

Descriptive Statistics in One Call

count   5.0
mean    187.2
std      94.3
min      88.0
25%     105.0
50%     178.0
75%     245.0
max     320.0

Important

.describe() gives the 8-number summary in one line. It is the single most useful first-look method for any quantitative column.

The ddof=1 vs ddof=0 Trap

pandas defaults to sample standard deviation (divide by n−1);
NumPy defaults to population standard deviation (divide by n).

Warning

Rule of thumb. Leave pandas’ .std() at its default — sample std (ddof=1) is what every statistics textbook and Excel’s STDEV.S use.

§4 · NumPy

§4 — NumPy Arrays

Vectorised math at C speed.

Lists vs NumPy: The Single Bug Everyone Hits

[100, 102, 98, 105, 101, 100, 102, 98, 105, 101]
[200 204 196 210 202]

Important

A Python list * operator means concatenate. A NumPy array * operator means element-wise multiplication. Mistaking them is the most common bug when moving from lists to NumPy.

Why NumPy Is 10–100× Faster

Note

NumPy delegates math to BLAS/LAPACK — the same compiled C/Fortran libraries that run MATLAB and Bloomberg risk engines. The Python interpreter never touches the inner loop.

Simulating Stock Returns

A one-year simulation in three lines: draw 252 daily returns from N(0.08%, 1.5%), compound them, see the path.

Tip

Replace the seed(42) line and re-run to see a different price path. This is one Monte-Carlo trial — risk teams run 10,000 of them every morning.

§5 · SciPy Stats

§5 — SciPy Stats

Distributions, CDFs, and Value at Risk.

CDF and PPF — Two Halves of One Idea

CDF: value → probability

\[P(X < x) = \text{cdf}(x)\]

stats.norm.cdf(10, loc=8, scale=2)
# 0.841

84.1 % of outcomes from N(8, 2²) fall below 10.

PPF: probability → value

\[x \text{ such that } P(X < x) = p\]

stats.norm.ppf(0.05, loc=8, scale=2)
# 4.71

5 % of outcomes fall below 4.71.

The PPF is the mathematical inverse of the CDF. Together they let you answer every probability question about a normal random variable.

CDF and PPF Visualised

Value at Risk = PPF of the Left Tail

Important

95 % VaR = “with 95 % confidence, the portfolio is worth at least this much.” A single number that every bank reports daily under Basel II/III.

§6 · Putting It Together

§6 — Mini Project

Portfolio analyser in 20 lines.

A Working Portfolio Analyser

Tip

Twenty lines combine everything you’ve seen: pandas Series of weights, NumPy random sampling, matrix multiplication, statistical aggregation, and a SciPy VaR computation. The whole pipeline is the foundation of every quant fund.

Chapter Summary

Concept Tool Use
Ordered collection list Tickers, transactions
Concise transform comprehension Buy/Sell labels
Labelled 1D pd.Series Time series
Fast vector math np.array Monte Carlo, returns
Probability scipy.stats.norm VaR, hypothesis tests
Charts matplotlib Histograms, lines

Next: Chapter 2 — DataFrames (Series side-by-side; the actual workhorse of every analytical pipeline).

Discussion Questions

  1. Why does pandas default std() to n−1 while NumPy defaults to n? What would happen if a bank computed VaR using n instead of n−1 on a small sample?
  2. When would you choose a list comprehension over an explicit for-loop? When would the for-loop be more readable?
  3. The portfolio analyser used a normal distribution for daily returns. List two ways real return distributions deviate from the normal assumption, and why that matters for VaR.
  4. Walk through the line port = returns @ weights. What is the shape of each operand? What does the @ operator produce here?