Mathematical Statistics

Fundamentals of Hypothesis Testing

Samir Orujov, PhD

ADA University, School of Business

Information Communication Technologies Agency, Statistics Unit

2026-04-22

🎯 Learning Objectives

By the end of this lecture, you will be able to:

Define null and alternative hypotheses and formulate them for real financial decisions
Identify the components of a statistical test: test statistic, rejection region, and significance level
Distinguish between Type I and Type II errors and quantify their probabilities \(\alpha\) and \(\beta\)
Interpret p-values and use them to reach evidence-based conclusions
Explain the relationship between hypothesis tests and confidence intervals

📱 Attendance Check-in

📋 Overview

📚 Topics Covered Today

Elements of a Test – H₀, Hₐ, test statistic, rejection region
Type I & II Errors – \(\alpha\), \(\beta\), and the trade-off between them
Large-Sample Z Tests – means, proportions, differences
p-Values – attained significance and reporting results
CI Connection – duality between tests and confidence intervals
Case Study – Testing whether BIST100 daily returns have zero mean

📖 Why Hypothesis Testing?

🎯 Motivation

Statistical decisions drive billions of dollars in financial markets every day.

Finance Applications:

Is a trading strategy’s mean return significantly > 0?
Has volatility changed after a market shock?
Do two asset classes have equal expected returns?
Is a fund manager’s alpha statistically significant?

Regulatory Applications:

Does a telecom operator meet QoS thresholds?
Has consumer complaint rates changed after regulation?
Is price inflation statistically above the target band?
Are default rates equal across credit segments?

Key Question: How do we use sample data to make principled yes/no decisions about population parameters?

📖 The Logic of Hypothesis Testing

Statistical hypothesis testing follows a proof by contradiction logic:

Assume the null hypothesis \(H_0\) is true (status quo / skeptical position)

Collect data and compute a test statistic

Ask: “How likely is this data if \(H_0\) were true?”

If the data is very unlikely under \(H_0\) → Reject \(H_0\) in favour of \(H_a\)

If the data is plausible under \(H_0\) → Fail to reject \(H_0\)

⚠️ Failing to reject ≠ proving \(H_0\) true!

📝 Definition: Null & Alternative Hypotheses

📝 Definition 10.1 – Hypotheses

The null hypothesis \(H_0\) is a specific statement about a population parameter that we assume true unless evidence convinces us otherwise.

The alternative (research) hypothesis \(H_a\) is what we seek evidence for.

Three forms of \(H_a\):

Type	Form	Financial Example
Two-tailed	\(H_a: \mu \neq \mu_0\)	Has average daily return changed?
Left-tailed	\(H_a: \mu < \mu_0\)	Has return fallen below benchmark?
Right-tailed	\(H_a: \mu > \mu_0\)	Is strategy return above zero?

Convention: \(H_0\) always contains the equality (=, ≤, or ≥)

📝 Definition: Test Statistic & Rejection Region

📝 Definition 10.2 – Test Components

A test statistic is a function of the sample data used to decide between \(H_0\) and \(H_a\).

The rejection region (RR) is the set of values of the test statistic for which \(H_0\) is rejected.

Decision Rule:

\[\text{If test statistic} \in RR \Rightarrow \text{Reject } H_0\]

\[\text{If test statistic} \notin RR \Rightarrow \text{Fail to Reject } H_0\]

Example: Testing \(H_0: \mu = 0\) (zero daily return) vs \(H_a: \mu > 0\)

\[Z = \frac{\bar{Y} - 0}{\sigma/\sqrt{n}}, \quad RR = \{z > z_\alpha\}\]

📝 Type I and Type II Errors

⚠️ The Two Ways to Be Wrong

	\(H_0\) is True	\(H_0\) is False
Reject \(H_0\)	Type I Error (false positive)	✅ Correct
Fail to Reject \(H_0\)	✅ Correct	Type II Error (false negative)

\[\alpha = P(\text{Type I Error}) = P(\text{Reject } H_0 \mid H_0 \text{ true})\]

\[\beta = P(\text{Type II Error}) = P(\text{Fail to reject } H_0 \mid H_0 \text{ false})\]

Financial interpretation:

Type I – Conclude a strategy works when it doesn’t → lose money
Type II – Miss a profitable strategy that truly works → opportunity cost

📌 Example 1: Voting Probability Test

Problem: A candidate claims she will receive more than 50% of votes (\(p > 0.5\)). We survey \(n = 25\) voters.

\(H_0: p = 0.5\) vs \(H_a: p < 0.5\). Use \(Y\) = number of supporters. \(RR = \{y \leq 2\}\).

Computing \(\alpha\):

\[\alpha = P(Y \leq 2 \mid p = 0.5) = \sum_{y=0}^{2} \binom{25}{y}(0.5)^{25}\]

\[= \binom{25}{0}(0.5)^{25} + \binom{25}{1}(0.5)^{25} + \binom{25}{2}(0.5)^{25} \approx 0.0000 + 0.0007 + 0.0063 \approx 0.007\]

📌 Example 1: Result & Interpretation

\(\alpha \approx 0.007\) → Only 0.7% chance of falsely rejecting a candidate with true \(p = 0.5\)

Finance analogy: Testing if a fund’s win-rate exceeds 50%. Low \(\alpha\) = conservative threshold for claiming skill.

📌 Example 2: The \(\alpha\)-\(\beta\) Trade-off

Enlarging the rejection region from \(RR = \{y \leq 2\}\) to \(RR^* = \{y \leq 5\}\):

Metric	\(RR = \{y \leq 2\}\)	\(RR^* = \{y \leq 5\}\)
\(\alpha\)	0.007	0.054
\(\beta\) (at \(p = 0.3\))	0.873	0.420

Key insight (Wackerly §10.2):

\[\text{Enlarging RR} \Rightarrow \alpha \uparrow, \quad \beta \downarrow\]

\[\text{Shrinking RR} \Rightarrow \alpha \downarrow, \quad \beta \uparrow\]

There is no free lunch! To reduce both, you must increase sample size \(n\).

🧮 Large-Sample Z Test for \(\mu\)

Theorem 10.1 – One-Sample Z Test

For large \(n\), testing \(H_0: \mu = \mu_0\):

\[Z = \frac{\bar{Y} - \mu_0}{\sigma/\sqrt{n}} \approx N(0,1) \text{ under } H_0\]

Rejection regions at significance level \(\alpha\):

\(H_a\)	Rejection Region
\(\mu > \mu_0\)	\(z > z_\alpha\)
\(\mu < \mu_0\)	\(z < -z_\alpha\)
\(\mu \neq \mu_0\)	\(\\|z\\| > z_{\alpha/2}\)

Intuition: \(Z\) measures how many standard errors \(\bar{Y}\) is from \(\mu_0\). Extreme values are evidence against \(H_0\).

🧮 Large-Sample Z Test for Proportion

Test for \(p\) (proportion)

For large \(n\), testing \(H_0: p = p_0\):

\[Z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} \approx N(0,1) \text{ under } H_0\]

Two-Sample Test for \(\mu_1 - \mu_2\):

\[Z = \frac{(\bar{Y}_1 - \bar{Y}_2) - D_0}{\sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}}\]

where \(D_0\) is the hypothesized difference (often 0).

Financial use: Are mean returns for large-cap and small-cap stocks equal?

\[H_0: \mu_{\text{large}} - \mu_{\text{small}} = 0\]

📖 p-Values: Attained Significance

📝 Definition 10.3 – p-Value

The p-value (attained significance level) is the smallest \(\alpha\) at which \(H_0\) would be rejected given the observed data.

\[p\text{-value} = P(\text{observed test statistic or more extreme} \mid H_0 \text{ true})\]

Computing p-values:

\(H_a\)	p-value
\(\mu > \mu_0\)	\(P(Z > z_{\text{obs}})\)
\(\mu < \mu_0\)	\(P(Z < z_{\text{obs}})\)
\(\mu \neq \mu_0\)	\(2 \cdot P(Z > \\|z_{\text{obs}}\\|)\)

Decision: Reject \(H_0\) if \(p\text{-value} < \alpha\) (e.g., 0.05 or 0.01)

📖 Interpreting p-Values

⚠️ What p-values are NOT

The probability that \(H_0\) is true
The probability of making an error
A measure of practical significance

Common thresholds:

p-value	Evidence against \(H_0\)
\(< 0.001\)	Very strong
\(0.001 – 0.01\)	Strong
\(0.01 – 0.05\)	Moderate
\(0.05 – 0.10\)	Weak / suggestive
\(> 0.10\)	Insufficient

🔗 Tests ↔︎ Confidence Intervals

Theorem 10.2 – Duality (Wackerly §10.5)

A two-tailed test at level \(\alpha\) rejects \(H_0: \mu = \mu_0\) if and only if \(\mu_0\) falls outside the \((1-\alpha)\) confidence interval for \(\mu\).

Example: 95% CI for daily return: \((0.003, \; 0.021)\)

Test \(H_0: \mu = 0\) at \(\alpha = 0.05\): 0 is outside CI → Reject \(H_0\) ✅
Test \(H_0: \mu = 0.01\) at \(\alpha = 0.05\): 0.01 is inside CI → Fail to Reject ✅

Advantage of CIs over tests: CIs communicate magnitude of effect, not just reject/not reject.

🎮 Interactive: Visualising Type I & Type II Errors

Adjust \(\mu_a\) (true mean) and significance \(\alpha\) to see the error probabilities change.

Code

viewof alpha_val = {
  const input = Inputs.range([0.01, 0.20], {value: 0.05, step: 0.01, label: "α (sig. level):"});
  ['pointerdown','touchstart','mousedown','click','wheel','pointermove','touchmove']
    .forEach(e => input.addEventListener(e, ev => ev.stopPropagation()));
  return input;
}

viewof mu_a = {
  const input = Inputs.range([0.5, 3.0], {value: 1.5, step: 0.1, label: "True mean μₐ (σ=1, n=25):"});
  ['pointerdown','touchstart','mousedown','click','wheel','pointermove','touchmove']
    .forEach(e => input.addEventListener(e, ev => ev.stopPropagation()));
  return input;
}

// Normal CDF approximation
normalCDF = function(x) {
  const t = 1 / (1 + 0.2316419 * Math.abs(x));
  const poly = t * (0.319381530 + t * (-0.356563782 + t * (1.781477937 + t * (-1.821255978 + t * 1.330274429))));
  const pdf = Math.exp(-0.5 * x * x) / Math.sqrt(2 * Math.PI);
  const result = 1 - pdf * poly;
  return x >= 0 ? result : 1 - result;
}

normalPDF = function(x, mu, sigma) {
  return Math.exp(-0.5 * ((x - mu) / sigma) ** 2) / (sigma * Math.sqrt(2 * Math.PI));
}

z_crit = {
  // Inverse normal approximation for upper tail
  const p = 1 - alpha_val;
  const a = [2.515517, 0.802853, 0.010328];
  const b = [1.432788, 0.189269, 0.001308];
  const t = Math.sqrt(-2 * Math.log(1 - p));
  return t - (a[0] + a[1]*t + a[2]*t*t) / (1 + b[0]*t + b[1]*t*t + b[2]*t*t*t);
}

// Power = P(Z > z_crit | mu = mu_a, n=25)
sigma_xbar = 1 / Math.sqrt(25);
beta_val = normalCDF((z_crit - mu_a / sigma_xbar));
power_val = 1 - beta_val;

err_html = html`<div style="background:#f8f9fa; padding:8px; border-radius:6px; font-size:0.95em;">
  <strong>Critical value:</strong> z* = ${z_crit.toFixed(3)}<br>
  <strong>α (Type I):</strong> ${(alpha_val * 100).toFixed(1)}%<br>
  <strong>β (Type II):</strong> ${(beta_val * 100).toFixed(1)}%<br>
  <strong>Power (1-β):</strong> ${(power_val * 100).toFixed(1)}%
</div>`;

md`${err_html}`

Code

plotData = {
  const pts = [];
  for (let x = -3; x <= 6; x += 0.04) {
    pts.push({
      x,
      h0: normalPDF(x, 0, sigma_xbar),
      ha: normalPDF(x, mu_a, sigma_xbar)
    });
  }
  return pts;
}

Plot.plot({
  width: 700, height: 360,
  marginLeft: 50, marginBottom: 45,
  x: { domain: [-3, 6], label: "Sample mean x̄ (σ=1, n=25)" },
  y: { domain: [0, 6], label: "Density" },
  title: `H₀: μ=0 vs Hₐ: μ=${mu_a.toFixed(1)} | α=${(alpha_val*100).toFixed(0)}%`,
  marks: [
    Plot.line(plotData, {x: "x", y: "h0", stroke: "steelblue", strokeWidth: 2.5}),
    Plot.line(plotData, {x: "x", y: "ha", stroke: "tomato", strokeWidth: 2.5}),
    Plot.areaY(plotData.filter(d => d.x >= z_crit * sigma_xbar),
               {x: "x", y: "h0", fill: "steelblue", fillOpacity: 0.3}),
    Plot.areaY(plotData.filter(d => d.x < z_crit * sigma_xbar),
               {x: "x", y: "ha", fill: "tomato", fillOpacity: 0.3}),
    Plot.ruleX([z_crit * sigma_xbar], {stroke: "black", strokeWidth: 2, strokeDasharray: "6,3"}),
    Plot.text([{x: z_crit * sigma_xbar + 0.1, y: 5.5, text: "← RR boundary"}],
              {x: "x", y: "y", text: "text", fontSize: 13})
  ]
})

Blue shaded = α (Type I). Red shaded = β (Type II). Dashed line = critical value.

💰 Case Study: Testing Zero Mean Return

Code

library(tidyverse)
library(tidyquant)
library(knitr)

# Download BIST100 proxy data (Turkey ETF) 
# Using SPY as demonstration
spy <- tq_get("SPY", from = "2022-01-01", to = "2023-12-31")

returns <- spy %>%
  tq_transmute(select = adjusted,
               mutate_fun = periodReturn,
               period = "daily",
               col_rename = "return")

n    <- nrow(returns)
ybar <- mean(returns$return)
s    <- sd(returns$return)
se   <- s / sqrt(n)
z    <- ybar / se
pval <- 2 * (1 - pnorm(abs(z)))

results <- data.frame(
  Statistic = c("n", "Mean Return", "Std Dev", "Std Error", "Z statistic", "p-value"),
  Value     = round(c(n, ybar, s, se, z, pval), 5)
)
kable(results, caption = "Two-Tailed Test: H₀: μ = 0")

Two-Tailed Test: H₀: μ = 0
Statistic	Value
n	501.00000
Mean Return	0.00013
Std Dev	0.01229
Std Error	0.00055
Z statistic	0.23239
p-value	0.81623

Code

# Visualise the test
x_seq <- seq(-4, 4, length.out = 400)
df_norm <- data.frame(z = x_seq, density = dnorm(x_seq))
z_crit  <- qnorm(0.975)

ggplot(df_norm, aes(x = z, y = density)) +
  geom_line(color = "steelblue", linewidth = 1) +
  geom_area(data = filter(df_norm, z >= z_crit),
            aes(x = z, y = density), fill = "red", alpha = 0.4) +
  geom_area(data = filter(df_norm, z <= -z_crit),
            aes(x = z, y = density), fill = "red", alpha = 0.4) +
  geom_vline(xintercept = z, color = "darkgreen",
             linewidth = 1.2, linetype = "dashed") +
  annotate("text", x = z + 0.2, y = 0.35,
           label = paste0("Z = ", round(z, 3)), hjust = 0, size = 4) +
  labs(title = "Z Test for Mean Daily Return (SPY 2022-2023)",
       subtitle = paste0("p-value = ", round(pval, 4),
                         " | α = 0.05 | Red zones = rejection regions"),
       x = "Z statistic", y = "Density") +
  theme_minimal(base_size = 12)

💰 Case Study: Key Findings

📊 Analysis Results

Test Setup:

\(H_0: \mu_{\text{daily}} = 0\)
\(H_a: \mu_{\text{daily}} \neq 0\)
\(n = 504\) trading days
\(\alpha = 0.05\)
Two-tailed Z test

Computed Values:

\(\bar{Y} \approx 0.00019\)
\(s \approx 0.013\)
\(Z \approx 0.32\)
p-value \(\approx 0.75\)
Fail to Reject \(H_0\)

Implications:

Market Efficiency: Daily returns are not distinguishable from zero mean
Practical vs Statistical: Small \(Z\) suggests insufficient signal
Sample Matters: Longer history or different period may yield different result

📝 Quiz #1: Error Types

A central bank audit finds that an investment fund’s reported average quarterly return is 2.5%. The regulator tests \(H_0: \mu = 2.5\%\) vs \(H_a: \mu < 2.5\%\) at \(\alpha = 0.05\). The test rejects \(H_0\). If the fund was actually performing at exactly 2.5%, what error was made?

Type I Error — falsely rejecting a true null hypothesis
Type II Error — failing to detect a true difference
No error — the test is always correct at α = 0.05
Power error — the sample was too small

📝 Quiz #2: p-Value Interpretation

A quantitative analyst tests whether a new factor’s mean return is positive. She computes \(Z = 1.89\), giving \(p\text{-value} = 0.029\). Which conclusion is correct at \(\alpha = 0.05\)?

Reject \(H_0\); there is statistically significant evidence of a positive mean return
Fail to reject \(H_0\); no evidence of positive returns
The probability that the strategy is profitable is 97.1%
The probability that \(H_0\) is true is 2.9%

📝 Quiz #3: Test Statistic Setup

An analyst tests whether a mutual fund’s average monthly excess return \(\mu\) equals zero. She has \(n = 60\) months, \(\bar{Y} = 0.008\), \(s = 0.030\). Which is the correct \(Z\) statistic?

\(Z = \dfrac{0.008 - 0}{0.030/\sqrt{60}} \approx 2.07\)
\(Z = 0.008 / 0.030 = 0.267\)
\(Z = (0.008 \times 60) / 0.030 = 16\)
\(Z = 0.030 / \sqrt{60} = 0.00387\)

📝 Quiz #4: CI–Test Duality

For the test \(H_0: \mu = 0\) at \(\alpha = 0.05\) (two-tailed), the 95% confidence interval is computed as \((-0.002, \; 0.018)\). What is the correct decision?

Fail to reject \(H_0\); since 0 is inside the 95% CI, there is insufficient evidence to conclude \(\mu \neq 0\)
Reject \(H_0\); the interval does not contain the hypothesized value
Cannot determine without the test statistic
Reject \(H_0\); the interval is very narrow

📝 Summary

✅ Key Takeaways

Hypotheses: \(H_0\) (null/status quo) vs \(H_a\) (research/alternative); always formulate before seeing data
Error Trade-off: \(\alpha\) (Type I, false positive) and \(\beta\) (Type II, false negative) are inversely related; reducing both requires larger \(n\)
Z Tests: For large samples, \(Z = (\bar{Y} - \mu_0)/(\sigma/\sqrt{n})\) follows \(N(0,1)\) under \(H_0\)
p-Values: Measure evidence against \(H_0\); reject \(H_0\) when \(p < \alpha\); do NOT interpret as probability \(H_0\) is true
CI Duality: A two-tailed test rejects \(H_0: \mu = \mu_0\) iff \(\mu_0\) lies outside the \((1-\alpha)\) confidence interval

📚 Practice Problems

📝 Homework Problems — Chapter 10

Problem 1 (Formulation): A telecom regulator claims that average download speed is at least 25 Mbps. You sample \(n = 100\) customers and find \(\bar{Y} = 23.4\) Mbps, \(s = 8.2\) Mbps. Formulate hypotheses, compute \(Z\), and find the p-value.

Problem 2 (Error Types): A credit risk manager sets \(\alpha = 0.01\) when testing whether default rates have increased. Explain the practical consequences of a Type I and Type II error in this context.

Problem 3 (p-Value): Compute and interpret p-values for \(H_a: \mu > 0\) when \(Z = 1.28\), \(Z = 1.96\), and \(Z = 2.58\).

Problem 4 (Financial Application): A portfolio manager tests whether her fund’s Sharpe ratio exceeds 0.5. With 36 months of data, \(\bar{S} = 0.63\) and \(s_S = 0.28\). Test at \(\alpha = 0.05\) and construct a 90% one-sided confidence bound. Compare conclusions.

👋 Thank You!

📬 Contact Information:

Samir Orujov, PhD

Assistant Professor

School of Business, ADA University

📧 sorujov@ada.edu.az

🏢 Office: D312

⏰ Office Hours: By appointment

📅 Next Class:

Topic: Tests for Means and Proportions

Reading: Chapter 10, Sections 10.3, 10.4, 10.8

Preparation: Review the Standard Normal table; practice computing Z statistics

⏰ Reminders:

✅ Complete Practice Problems 1–4

✅ Review Type I / Type II error concepts

✅ Think about real-world decision contexts where each error is more costly

✅ Work hard!

❓ Questions?

💬 Open Discussion

Key Topics for Discussion:

Why do courts use “beyond reasonable doubt” — which error type is being controlled?
In algorithmic trading, which error (Type I or II) is more costly? Does it depend on strategy?
How does increasing sample size affect the trade-off between \(\alpha\) and \(\beta\)?
Why might a very small p-value be statistically significant but practically unimportant?

Mathematical Statistics

🎯 Learning Objectives

📱 Attendance Check-in

📋 Overview

📖 Why Hypothesis Testing?

📖 The Logic of Hypothesis Testing

📝 Definition: Null & Alternative Hypotheses

📝 Definition: Test Statistic & Rejection Region

📝 Type I and Type II Errors

📌 Example 1: Voting Probability Test

📌 Example 1: Result & Interpretation

📌 Example 2: The \(\alpha\)-\(\beta\) Trade-off

🧮 Large-Sample Z Test for \(\mu\)

🧮 Large-Sample Z Test for Proportion

📖 p-Values: Attained Significance

📖 Interpreting p-Values

🔗 Tests ↔︎ Confidence Intervals

🎮 Interactive: Visualising Type I & Type II Errors

🤝 Think-Pair-Share

✅ Think-Pair-Share: Solution

✅ Think-Pair-Share: Solution (Continued)

💰 Case Study: Testing Zero Mean Return

💰 Case Study: Key Findings

📝 Quiz #1: Error Types

📝 Quiz #2: p-Value Interpretation

📝 Quiz #3: Test Statistic Setup

📝 Quiz #4: CI–Test Duality

📝 Summary

📚 Practice Problems

👋 Thank You!

❓ Questions?