Mathematical Statistics

The Central Limit Theorem & Normal Approximation to the Binomial

Samir Orujov, PhD

ADA University, School of Business

Information Communication Technologies Agency, Statistics Unit

2026-03-07

๐ŸŽฏ Learning Objectives

By the end of this lecture, you will be able to:

  • State the Central Limit Theorem (Theorem 7.4) precisely and explain its conditions โ€” finite mean and variance, large \(n\)

  • Apply the CLT to approximate probabilities for sample means from any population distribution in finance and economics

  • Recognize how the CLT justifies normal-based inference even when the underlying population is non-normal (e.g., stock returns, income data)

  • Derive the normal approximation to the binomial distribution and apply the continuity correction for improved accuracy

  • Evaluate when the normal approximation to the binomial is adequate using the \(n > 9\!\left(\frac{\max(p,q)}{\min(p,q)}\right)\) rule of thumb

๐Ÿ“ฑ Attendance Check-in

๐Ÿ“‹ Overview

๐Ÿ“š Topics Covered Today

  • Motivation โ€” Why CLT is called the most important theorem in statistics

  • Theorem 7.4: The CLT โ€” Formal statement, conditions, and asymptotic normality

  • Intuition via Simulation โ€” How the sampling distribution of \(\bar{Y}\) converges to normal

  • CLT in Finance & Economics โ€” Portfolio returns, survey sampling, risk modeling

  • Normal Approximation to the Binomial โ€” Section 7.5 and the continuity correction

  • Case Study โ€” Applying CLT and binomial approximation to Azerbaijani telecom data

๐ŸŒ Why the CLT Matters

๐Ÿ’ก The Problem We Solve Today

In Lecture 1, we showed \(\bar{Y} \sim N(\mu, \sigma^2/n)\) โ€” but only when sampling from a normal population.

Reality check: Most economic and financial data are not normal:

  • Income and wealth are right-skewed (Pareto-like)
  • Asset returns have heavy tails (leptokurtic)
  • Queue times and service durations are exponential or gamma
  • Count data (defaults, transactions) follow Poisson or binomial

๐ŸŽฏ What the CLT Promises

Regardless of the population distribution, as long as \(\mu\) and \(\sigma^2\) are finite, the sampling distribution of \(\bar{Y}\) becomes approximately normal for large \(n\).

This is why every confidence interval, every hypothesis test, and every econometric estimator works in practice.

๐Ÿ“ Warm-Up: Recall from Chapter 5

We already know two key facts about \(\bar{Y} = \frac{1}{n}\sum_{i=1}^n Y_i\) from any population:

\[E(\bar{Y}) = \mu \qquad \text{and} \qquad V(\bar{Y}) = \frac{\sigma^2}{n}\]

These hold for any distribution โ€” no normality required.

But these are just the mean and variance of \(\bar{Y}\). What is its full shape?

The Missing Piece

Without knowing the distribution of the \(Y_i\)โ€™s, we cannot (in general) state the distribution of \(\bar{Y}\).

The CLT completes this picture: for large \(n\), the shape is always approximately normal.

๐Ÿงฎ Theorem 7.4: The Central Limit Theorem

Theorem 7.4 โ€” Central Limit Theorem (Wackerly et al., p. 372)

Let \(Y_1, Y_2, \ldots, Y_n\) be i.i.d. with \(E(Y_i) = \mu\) and \(V(Y_i) = \sigma^2 < \infty\). Define:

\[U_n = \frac{\bar{Y} - \mu}{\sigma / \sqrt{n}} = \frac{\sum_{i=1}^n Y_i - n\mu}{\sigma\sqrt{n}}\]

Then the distribution function of \(U_n\) converges to the standard normal as \(n \to \infty\):

\[\lim_{n \to \infty} P(U_n \leq u) = \int_{-\infty}^{u} \frac{1}{\sqrt{2\pi}} e^{-t^2/2}\, dt \quad \text{for all } u\]

Practical rule of thumb: \(n > 30\) is usually sufficient for the approximation to be valid.

For highly skewed populations (e.g., income, insurance claims), larger \(n\) may be needed.

๐Ÿ”‘ What the CLT Tells Us (Asymptotic Normality)

For large \(n\), the CLT allows us to write:

\[\bar{Y} \stackrel{a}{\sim} N\!\left(\mu,\; \frac{\sigma^2}{n}\right)\]

where the \(\stackrel{a}{\sim}\) symbol means asymptotically distributed as.

This means probability statements about \(\bar{Y}\) can be evaluated using the standard normal:

\[P(a \leq \bar{Y} \leq b) \approx P\!\left(\frac{a - \mu}{\sigma/\sqrt{n}} \leq Z \leq \frac{b - \mu}{\sigma/\sqrt{n}}\right)\]

Two Equivalent Formulations

Statement Useful whenโ€ฆ
\(\bar{Y} \stackrel{a}{\sim} N(\mu, \sigma^2/n)\) Reasoning about sample mean
\(\sum Y_i \stackrel{a}{\sim} N(n\mu, n\sigma^2)\) Reasoning about totals (costs, sales)

๐Ÿ“Œ Example 7.8: School Exam Scores

Problem (Wackerly, p. 372): Achievement test scores of all high school seniors have \(\mu = 60\) and \(\sigma^2 = 64\). A random sample of \(n = 100\) students from one school yielded \(\bar{Y} = 58\). Is this evidence that the school is inferior?

Finance Parallel: Think of \(\bar{Y}\) as the average quarterly return of 100 assets in a portfolio. Is an observed mean of 58 unusual if the true mean is 60?

Solution: By CLT, \(\frac{\bar{Y} - 60}{8/\sqrt{100}} \approx N(0,1)\), so:

\[P(\bar{Y} \leq 58) = P\!\left(Z \leq \frac{58 - 60}{0.8}\right) = P(Z \leq -2.5) \approx 0.0062\]

Conclusion

A probability of only \(0.62\%\) โ€” very strong evidence this schoolโ€™s average is below the population mean.

This is the logic of hypothesis testing (Chapter 10 preview): small \(p\)-values signal unusual results.

๐Ÿ“Œ Example: Total Service Time (Queue Theory)

Problem (Wackerly, Example 7.9, p. 373): Service times per customer: \(\mu = 1.5\) min, \(\sigma^2 = 1.0\). Approximate \(P\!\left(\sum_{i=1}^{100} Y_i \leq 120 \text{ min}\right)\).

Finance Context: Think of this as: 100 loan applications, each requiring on average 1.5 hours of review time. What is the chance the entire batch can be processed within a 2-hour window (120 minutes)?

Solution:

\[P\!\left(\sum Y_i \leq 120\right) = P\!\left(\bar{Y} \leq 1.20\right) = P\!\left(Z \leq \frac{1.20 - 1.50}{1/\sqrt{100}}\right) = P(Z \leq -3) \approx 0.0013\]

Interpretation: Only a \(0.13\%\) chance โ€” it is virtually impossible to process 100 clients in 2 hours. The bank must budget for longer.

๐Ÿฆ CLT in Practice: Portfolio Returns

Finance Application: Averaging Returns Across Assets

Suppose a portfolio of \(n = 50\) stocks. Individual stock daily returns \(Y_i\) have:

\[\mu = 0.05\% \quad \text{and} \quad \sigma = 1.8\%\]

The returns are not normal (heavy tails, skewness from individual stocks). Yet by CLT:

\[\bar{Y} \stackrel{a}{\sim} N\!\left(0.0005,\; \frac{0.018^2}{50}\right) = N\!\left(0.0005,\; 6.48 \times 10^{-6}\right)\]

Question: What is the probability the portfolio mean return exceeds \(0.12\%\) on a given day?

\[P(\bar{Y} > 0.0012) = P\!\left(Z > \frac{0.0012 - 0.0005}{0.018/\sqrt{50}}\right) = P(Z > 2.75) \approx 0.003\]

Why This Matters for Risk Management

Value-at-Risk (VaR) calculations rely on the normality of portfolio means โ€” the CLT is their theoretical justification.

๐ŸŽฎ Interactive: CLT in Action

Key insight: As sample size \(n\) increases, the sampling distribution of \(\bar{Y}\) converges to a normal curve โ€” regardless of the original population shape.

๐Ÿ”ฌ Proof Sketch: Why the CLT Works

Wackerly ยง7.4 gives an MGF-based proof. Here is the essential logic:

Step 1: Standardize each \(Y_i\): let \(Z_i = (Y_i - \mu)/\sigma\), so \(E(Z_i) = 0\), \(V(Z_i) = 1\). Then \(U_n = \frac{1}{\sqrt{n}}\sum Z_i\).

Step 2: MGF factorization. Since the \(Z_i\) are i.i.d.: \[m_{U_n}(t) = \left[m_{Z_1}\!\left(\tfrac{t}{\sqrt{n}}\right)\right]^n\]

Step 3: Taylor expansion. Using \(m_{Z_1}(0) = 1\) and \(m'_{Z_1}(0) = E(Z_1) = 0\): \[m_{Z_1}\!\left(\tfrac{t}{\sqrt{n}}\right) \approx 1 + \frac{t^2/2}{n} \implies m_{U_n}(t) \approx \left(1 + \frac{t^2/2}{n}\right)^{\!n} \to e^{t^2/2}\]

Step 4: MGF uniqueness. \(e^{t^2/2}\) is the MGF of \(N(0,1)\). By Theorem 7.5 (convergence of MGFs), the distribution of \(U_n\) converges to standard normal. \(\blacksquare\)

Intuition

Averaging \(n\) i.i.d. shocks โ€” regardless of their shape โ€” progressively cancels out the skewness and heavy tails, leaving only the bell curve.

๐Ÿ’ฌ Think-Pair-Share: CLT Application

๐Ÿค” Discussion Problem (4 minutes)

Azerbaijanโ€™s telecommunications agency monitors daily internet outage durations across ISPs. Outage durations have \(\mu = 45\) min, \(\sigma = 30\) min (right-skewed). A sample of \(n = 36\) events from one ISP.

Questions:

  1. Can we use the CLT? What condition must hold?

  2. Find \(P(\bar{Y} > 55 \text{ min})\) โ€” probability the mean outage exceeds 55 minutes.

  3. If regulators flag ISPs with \(\bar{Y} > 55\) min as non-compliant, what is the Type I error (false flag rate)?

4:00

๐Ÿ’ฌ Think-Pair-Share: Solution

1. CLT conditions: \(n = 36 > 30\) โœ“ and \(\sigma^2 = 900 < \infty\) โœ“ โ€” CLT applies despite skewness.

2. Compute \(P(\bar{Y} > 55)\):

\[SE = \frac{\sigma}{\sqrt{n}} = \frac{30}{\sqrt{36}} = 5 \text{ min}\]

\[P(\bar{Y} > 55) = P\!\left(Z > \frac{55 - 45}{5}\right) = P(Z > 2.0) = 1 - 0.9772 = \boxed{0.0228}\]

3. Regulatory interpretation: If a compliant ISP (\(\mu = 45\)) is observed, there is only a \(2.28\%\) chance its sample mean exceeds 55 min. This is the significance level \(\alpha\) of a one-sided test โ€” a comfortably low false alarm rate.

Key Takeaway

Even with skewed data, large enough samples give us reliable normal-based probability calculations. This is the entire foundation of regulatory monitoring and economic inference.

๐Ÿ“Š Normal Approximation to the Binomial

The Bridge from Discrete to Continuous

A binomial variable \(Y \sim \text{Bin}(n, p)\) is a sum of i.i.d. Bernoulli trials:

\[Y = \sum_{i=1}^{n} X_i, \quad X_i \sim \text{Bernoulli}(p)\]

By the CLT, for large \(n\): \[Y \stackrel{a}{\sim} N\!\left(np,\; np(1-p)\right)\]

Or equivalently, the sample proportion: \[\hat{p} = \frac{Y}{n} \stackrel{a}{\sim} N\!\left(p,\; \frac{p(1-p)}{n}\right)\]

When is this approximation adequate?

\[n > 9\left(\frac{\max(p, q)}{\min(p, q)}\right) \quad \text{where } q = 1 - p\]

Equivalently: \(p \pm 3\sqrt{pq/n}\) must lie in \((0, 1)\).

โœ‚๏ธ The Continuity Correction

Key Improvement for Discrete Distributions

When approximating a discrete binomial with a continuous normal, each integer \(y\) corresponds to the interval \([y - 0.5,\; y + 0.5]\).

Binomial expression Normal approximation (with correction)
\(P(Y \leq b)\) \(P(W \leq b + 0.5)\)
\(P(Y \geq a)\) \(P(W \geq a - 0.5)\)
\(P(Y = k)\) \(P(k - 0.5 \leq W \leq k + 0.5)\)
\(P(a \leq Y \leq b)\) \(P(a - 0.5 \leq W \leq b + 0.5)\)

where \(W \sim N(np, np(1-p))\).

Mnemonic

โ€œAdd 0.5 outward from the region of interestโ€ โ€” expand by half a unit in each direction that includes more of the histogram.

๐Ÿ“Œ Example 7.10: Election Polling

Problem (Wackerly, p. 379): Candidate A needs at least 55% of votes in precinct 1 to win. She believes \(p = 0.50\) citywide. If \(n = 100\) voters show up, find \(P(Y/n \geq 0.55)\).

Solution: \(Y \sim \text{Bin}(100, 0.5)\). By CLT:

\[\hat{p} = Y/n \stackrel{a}{\sim} N\!\left(0.5,\; \frac{(0.5)(0.5)}{100}\right) = N(0.5,\; 0.0025)\]

\[P(\hat{p} \geq 0.55) = P\!\left(Z \geq \frac{0.55 - 0.50}{0.05}\right) = P(Z \geq 1.0) = \boxed{0.1587}\]

Economic Interpretation: Even with a fair coin (\(p=0.5\)), a supermajority of 55% still has a \(15.87\%\) chance of occurring in a precinct โ€” much higher than most people intuitively expect. This explains why close elections have wide prediction bands.

๐Ÿ“Œ Example 7.11: Loan Default Rates

Context: A bankโ€™s credit portfolio: \(n = 25\) loans, each has a 40% probability of default within 1 year.

Find \(P(Y \leq 8)\) and \(P(Y = 8)\) exactly and via normal approximation. Compare.

Population parameters: \(\mu_W = np = 25(0.4) = 10\); \(\sigma^2_W = np(1-p) = 25(0.4)(0.6) = 6\); \(\sigma_W = 2.449\).

Exact binomial: \(P(Y \leq 8) = 0.274\); \(P(Y = 8) = 0.120\).

Normal approximation with continuity correction:

\[P(Y \leq 8) \approx P\!\left(W \leq 8.5\right) = P\!\left(Z \leq \frac{8.5 - 10}{2.449}\right) = P(Z \leq -0.61) = 0.271\]

\[P(Y = 8) \approx P(7.5 \leq W \leq 8.5) = P(-1.02 \leq Z \leq -0.61) = 0.271 - 0.154 = 0.117\]

Probability Exact Binomial Normal Approx Error
\(P(Y \leq 8)\) 0.274 0.271 0.003
\(P(Y = 8)\) 0.120 0.117 0.003

The approximation is excellent even at \(n = 25\), because \(n > 9(0.6/0.4) = 13.5\) โœ“

๐ŸŽฎ Interactive: Normal Approximation to Binomial

See how well the normal curve fits the binomial histogram for different \(n\) and \(p\).

๐Ÿ’ฐ Case Study: Telecom Compliance via CLT

๐Ÿ“Š Show Python Code
import numpy as np
from scipy import stats

# โ”€โ”€โ”€ Scenario 1: CLT for Monthly Average Download Speed โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# ISP download speeds per user (seconds per session): exponential-like, skewed
# Regulatory benchmark: mean โ‰ฅ 50 Mbps
np.random.seed(42)
n_users = 50            # sample of users tested per ISP per month
mu_speed = 47.0         # true population mean (Mbps) โ€” slightly below target
sigma_speed = 18.0      # population std dev

# By CLT: ศฒ ~ N(47, 18ยฒ/50)
se = sigma_speed / np.sqrt(n_users)
# Probability that the sample MEAN EXCEEDS 50 Mbps (false pass rate)
z = (50 - mu_speed) / se
p_false_pass = 1 - stats.norm.cdf(z)

# โ”€โ”€โ”€ Scenario 2: Normal Approx to Binomial for Complaint Rates โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
# Out of n = 120 subscriber surveys, Y = complaints about slow speed
# Hโ‚€: p = 0.15 (industry norm); observed: 24 complaints (20%)
n_surveys, p0 = 120, 0.15
y_observed = 24
mu_bin = n_surveys * p0
sigma_bin = np.sqrt(n_surveys * p0 * (1 - p0))

# With continuity correction: P(Y โ‰ฅ 24) = P(W โ‰ฅ 23.5)
z_bin = (y_observed - 0.5 - mu_bin) / sigma_bin
p_at_least_24 = 1 - stats.norm.cdf(z_bin)
p_exact = 1 - stats.binom.cdf(y_observed - 1, n_surveys, p0)

Scenario 1 โ€” Download Speed (CLT)

Parameter Value
True mean 47 Mbps
SE of mean (n=50) 2.546 Mbps
\(P(\bar{Y} > 50)\) \(P(Z > 1.178) = 11.9\%\)
Risk 11.9% false-pass rate

Scenario 2 โ€” Complaint Rate (Binomial)

Quantity Value
Expected complaints \(\mu = 18\), \(\sigma = 3.91\)
\(P(Y \geq 24)\) โ€” normal approx \(0.0614\)
\(P(Y \geq 24)\) โ€” exact binomial \(0.0567\)
Continuity correction error \(+0.47\%\)

๐Ÿ’ฐ Case Study: Key Findings

๐Ÿ“Š Regulatory Implications

CLT for Speed Monitoring:

  • Individual speeds are skewed, but sample mean of \(n=50\) is approximately normal by CLT

  • ISP with true mean 47 Mbps still has 11.9% chance of passing a test โ€” substantial false-pass risk

  • Policy: Increase \(n\) to sharpen the test: \(n = \left(\frac{z_\alpha \sigma}{\delta}\right)^2 = \left(\frac{1.645 \times 18}{3}\right)^2 = 97\) users

Binomial Approximation for Complaint Rates:

  • With \(n = 120\) surveys and \(p_0 = 0.15\), adequacy check:

\[n = 120 > 9 \times \frac{0.85}{0.15} = 51 \checkmark\]

  • Normal approximation closely matches exact binomial (\(\Delta = 0.47\%\) error)

  • The continuity correction is essential for accuracy when computing tail probabilities on discrete distributions

๐Ÿ“ Quiz #1: CLT Identification

Monthly household income in a city has mean $2,400 and standard deviation $800 (right-skewed distribution). For a sample of \(n = 64\) households, what is the approximate distribution of \(\bar{Y}\)?

  • \(N(2400,\; 10000)\), since \(\sigma^2/n = 640000/64 = 10000\)
  • \(N(2400, 640000)\) โ€” same as population
  • Cannot determine without knowing the population distribution
  • \(N(0, 1)\) after standardization

๐Ÿ“ Quiz #2: Applying the CLT

An economist surveys \(n = 100\) firms. Each firmโ€™s quarterly profit change has \(\mu = 0.8\%\) and \(\sigma = 3.5\%\). What is \(P(\bar{Y} > 1.5\%)\)?

  • \(P(Z > 2.0) \approx 0.0228\)
  • \(P(Z > 0.2) \approx 0.4207\)
  • \(P(Z > 20) \approx 0\)
  • \(P(Z > 1.0) \approx 0.1587\)

๐Ÿ“ Quiz #3: Normal Approximation to Binomial

A telecom operator has a 12% subscriber churn rate. In a sample of \(n = 150\) subscribers, let \(Y\) = number who churned. Using the normal approximation with continuity correction, find \(P(Y \leq 20)\).

  • \(P\!\left(Z \leq \frac{20.5 - 18}{1.258 \times \sqrt{?}}\right)\): \(\mu=18\), \(\sigma=3.98\), \(P(Z \leq 0.628)\approx 0.735\)
  • \(P(Z \leq 0.503) \approx 0.692\) โ€” without continuity correction
  • \(P(Z \leq 1.508) \approx 0.934\) โ€” wrong parameters
  • Cannot use normal approximation here

๐Ÿ“ Quiz #4: Adequacy of Normal Approximation

For which combination of \(n\) and \(p\) is the normal approximation to the binomial NOT adequate?

  • \(n = 30, p = 0.03\)
  • \(n = 100, p = 0.5\) (need \(n > 9\))
  • \(n = 50, p = 0.4\) (need \(n > 13.5\))
  • \(n = 200, p = 0.1\) (need \(n > 81\))

๐Ÿ“ Summary

โœ… Key Takeaways from Lecture 2

  • Theorem 7.4 (CLT): For i.i.d. \(Y_i\) with finite \(\mu\) and \(\sigma^2\), \(U_n = (\bar{Y} - \mu)/(\sigma/\sqrt{n}) \xrightarrow{d} N(0,1)\). Valid for any population shape when \(n\) is large (typically \(n > 30\)).

  • Asymptotic normality: Write \(\bar{Y} \stackrel{a}{\sim} N(\mu, \sigma^2/n)\) for practical calculations โ€” even income data, service times, and return distributions obey this for large samples.

  • Total sums: The CLT also applies to \(\sum Y_i \stackrel{a}{\sim} N(n\mu, n\sigma^2)\) โ€” useful for budgeting, queueing, and portfolio totals.

  • Normal approximation to binomial: \(Y \sim \text{Bin}(n,p)\) โ†’ \(Y \stackrel{a}{\sim} N(np, np(1-p))\) for large \(n\). Adequate when \(n > 9(\max(p,q)/\min(p,q))\).

  • Continuity correction: Add/subtract 0.5 when translating discrete binomial events to continuous normal areas โ€” substantially improves accuracy.

๐Ÿ“š Practice Problems

๐Ÿ“ Homework Problems (ยง7.3โ€“7.5)

Problem 1 (CLT): Weekly repair costs per machine: \(\mu = \$20\), \(\sigma^2 = 100\) (exponential-like). For 5 machines over 1 week, approximate \(P\!\left(\sum_{i=1}^5 Y_i > 140\right)\).

Problem 2 (CLT + Sample Size): Survey: economistsโ€™ tax-saving estimates have \(\bar{Y} = 26\%\) and \(S = 12\%\) (\(n = 35\)). Find \(P(|\bar{Y} - \mu| \leq 1\%)\) and determine \(n\) needed for this to hold with probability \(0.99\).

Problem 3 (Binomial): Airline: 5% no-show rate; 160 tickets sold, 155 seats. Approximate \(P(\text{all passengers get a seat})\) using the normal approximation with continuity correction.

Problem 4 (Conceptual): Explain why the CLT does not require the population to be symmetric. Describe one real financial dataset where this is critical.

๐Ÿ“ฑ Late Check-in

๐Ÿ‘‹ Thank You!

๐Ÿ“ฌ Contact:

Samir Orujov, PhD โ€” Assistant Professor

School of Business, ADA University

๐Ÿ“ง sorujov@ada.edu.az  |  ๐Ÿข D312  |  โฐ By appointment

๐Ÿ“… Next Class: Estimation โ€” Point Estimators & Confidence Intervals

Reading: Wackerly Ch. 8, Sections 8.1โ€“8.6

Preparation: Review CLT (Sec. 7.3) and normal tables

Reminders: โœ… Practice Problems 1โ€“4  |  โœ… Review continuity correction  |  โœ… Work hard!

โ“ Questions?

๐Ÿ’ฌ Open Discussion

  • Is the CLT still valid if the \(Y_i\) are not identically distributed but only independent? (Hint: Lindebergโ€“Feller theorem)

  • For Azerbaijani household income data (strongly right-skewed), how large a sample would you need before the CLT safely applies?

  • Why does the continuity correction improve accuracy? What geometric property of histograms does it exploit?

  • The CLT guarantees asymptotic normality. Does this mean we should always use large samples? What are the tradeoffs (cost, time) in regulatory monitoring?