Mathematical Statistics

Linear Models & Least Squares

Samir Orujov, PhD

ADA University, School of Business

Information Communication Technologies Agency, Statistics Unit

2026-03-14

๐ŸŽฏ Learning Objectives

By the end of this lecture, you will be able to:

  • Formulate the simple linear regression model and state its assumptions

  • Derive the least-squares estimators \(\hat\beta_0\) and \(\hat\beta_1\) using the \(S_{xx}\), \(S_{xy}\) shorthand

  • Apply the unbiasedness results and compute \(S^2 = \text{SSE}/(n-2)\) as the estimator of \(\sigma^2\)

  • Execute the \(t\)-test for \(H_0: \beta_1 = 0\) and construct a confidence interval for \(\beta_1\)

  • Interpret the coefficient of determination \(r^2\) and Pearson correlation \(r\) in financial contexts

๐Ÿ“ฑ Attendance Check-in

๐Ÿ“‹ Overview

๐Ÿ“š Topics Covered Today

  • The Linear Model โ€” definition, assumptions, financial motivation (ยง11.1โ€“11.2)

  • Least Squares โ€” minimising SSE, deriving \(\hat\beta_1\) and \(\hat\beta_0\) (ยง11.3)

  • Estimator Properties โ€” unbiasedness, variances, \(S^2 = \text{SSE}/(n-2)\) (ยง11.4)

  • Inference on \(\beta_i\) โ€” \(t\)-test for slope, CI for \(\beta_1\) (ยง11.5)

  • Goodness of Fit โ€” \(r^2\) and Pearson \(r\) (ยง11.6โ€“11.7)

  • Case Study โ€” Estimating market beta (CAPM) with real stock data

๐Ÿ“– Motivation: Regression in Finance

๐ŸŽฏ Linear Models are Everywhere in Finance

Asset Pricing:

  • CAPM: \(R_i - R_f = \alpha + \beta(R_m - R_f) + \varepsilon\)
  • Market beta \(\beta\) measures systematic risk
  • Is \(\alpha \neq 0\)? โ†’ abnormal return (Jensenโ€™s alpha)
  • Factor models: Fama-French 3 & 5-factor

Regulation & Forecasting:

  • Broadband penetration as a function of price
  • Revenue elasticity to GDP growth
  • QoS speed vs infrastructure investment
  • Loan default probability vs credit score

Key idea: We observe \((x_i, Y_i)\) pairs. We believe \(E(Y \mid x) = \beta_0 + \beta_1 x\). How do we estimate \(\beta_1\), test whether it is significant, and measure how well \(x\) explains \(Y\)?

๐Ÿ“ Definition: The Simple Linear Model

๐Ÿ“ Definition 11.1 โ€” Simple Linear Regression Model

\[Y = \beta_0 + \beta_1 x + \varepsilon\]

Symbol Meaning
\(Y\) Random response variable
\(x\) Fixed (non-random) predictor variable
\(\beta_0\) Unknown intercept parameter
\(\beta_1\) Unknown slope parameter
\(\varepsilon\) Random error term

Assumptions (required for inference):

  • \(E(\varepsilon) = 0\) โ†’ \(E(Y \mid x) = \beta_0 + \beta_1 x\)
  • \(V(\varepsilon) = \sigma^2\) (constant, independent of \(x\)) โ€” homoscedasticity
  • \(\varepsilon \sim N(0, \sigma^2)\) for \(t\)-tests and CIs
  • Errors are independent across observations

๐Ÿงฎ The Method of Least Squares (ยง11.3)

Given \(n\) data pairs \((x_1, Y_1), \ldots, (x_n, Y_n)\), the predicted value under the model is \(\hat Y_i = \hat\beta_0 + \hat\beta_1 x_i\).

We minimise the Sum of Squares for Error (SSE):

\[\text{SSE} = \sum_{i=1}^n (Y_i - \hat Y_i)^2 = \sum_{i=1}^n [Y_i - (\hat\beta_0 + \hat\beta_1 x_i)]^2\]

Setting \(\partial\,\text{SSE}/\partial\hat\beta_0 = 0\) and \(\partial\,\text{SSE}/\partial\hat\beta_1 = 0\) and solving the normal equations gives the unique minimisers:

๐Ÿงฎ Least-Squares Estimators

Least-Squares Estimators (Wackerly ยง11.3)

\[\hat\beta_1 = \frac{S_{xy}}{S_{xx}}, \qquad \hat\beta_0 = \bar Y - \hat\beta_1 \bar x\]

Computational shorthand:

\[S_{xx} = \sum(x_i - \bar x)^2 = \sum x_i^2 - \frac{(\sum x_i)^2}{n}\]

\[S_{xy} = \sum(x_i - \bar x)(Y_i - \bar Y) = \sum x_i Y_i - \frac{(\sum x_i)(\sum Y_i)}{n}\]

\[S_{yy} = \sum(Y_i - \bar Y)^2 = \sum Y_i^2 - \frac{(\sum Y_i)^2}{n}\]

Fitted line: \(\hat Y = \hat\beta_0 + \hat\beta_1 x\) always passes through \((\bar x, \bar Y)\).

๐Ÿ“Œ Example 1: CAPM Beta by Hand

Scenario: Estimate market beta for a stock using \(n = 5\) weekly excess returns.

\(x_i\) (Market) \(Y_i\) (Stock) \(x_i Y_i\) \(x_i^2\)
โˆ’2 0 0 4
โˆ’1 0 0 1
0 1 0 0
1 1 1 1
2 3 6 4
ฮฃ = 0 ฮฃ = 5 ฮฃ = 7 ฮฃ = 10

\[S_{xx} = 10 - \frac{0^2}{5} = 10, \quad S_{xy} = 7 - \frac{(0)(5)}{5} = 7\]

๐Ÿ“Œ Example 1: Solution

\[\hat\beta_1 = \frac{7}{10} = 0.7, \quad \hat\beta_0 = \frac{5}{5} - 0.7 \times 0 = 1.0\]

Fitted line: \(\hat Y = 1.0 + 0.7x\)

Interpretation: For every 1% rise in the market, the stock is predicted to return an extra \(0.7\%\). The intercept \(\hat\beta_0 = 1.0\) suggests positive alpha โ€” but is it significant?

๐Ÿงฎ Properties of LS Estimators (ยง11.4)

Theorem 11.1 โ€” Unbiasedness and Variances

Under \(E(\varepsilon) = 0\) and \(V(\varepsilon) = \sigma^2\):

\[E(\hat\beta_1) = \beta_1, \qquad V(\hat\beta_1) = \frac{\sigma^2}{S_{xx}}\]

\[E(\hat\beta_0) = \beta_0, \qquad V(\hat\beta_0) = \sigma^2 \cdot \frac{\sum x_i^2}{n \cdot S_{xx}}\]

Estimating \(\sigma^2\): We cannot use \(\sigma^2\) directly โ€” we estimate it with:

\[S^2 = \frac{\text{SSE}}{n-2} \qquad \text{where} \qquad \text{SSE} = S_{yy} - \hat\beta_1 S_{xy}\]

\(S^2\) is an unbiased estimator of \(\sigma^2\) with \(n-2\) degrees of freedom (two are used estimating \(\beta_0\) and \(\beta_1\)).

๐Ÿ“Œ Example 2: Computing SSE and \(S^2\)

Continuing Example 1: \(S_{yy} = \sum Y_i^2 - (\sum Y_i)^2/n\)

\[\sum Y_i^2 = 0+0+1+1+9 = 11, \quad S_{yy} = 11 - \frac{25}{5} = 6\]

\[\text{SSE} = S_{yy} - \hat\beta_1 S_{xy} = 6 - 0.7 \times 7 = 6 - 4.9 = 1.1\]

\[S^2 = \frac{\text{SSE}}{n-2} = \frac{1.1}{3} = 0.367, \quad S = \sqrt{0.367} = 0.606\]

๐Ÿ“Œ Example 2: Interpretation

Interpretation of \(S\):

\(S \approx 0.61\%\) is the estimated standard deviation of the stock return around the fitted regression line โ€” a measure of residual risk not explained by market beta.

Rule of thumb:

  • Small \(S\) relative to \(\bar Y\) โ†’ good fit
  • Large \(S\) โ†’ substantial unexplained variation
  • Formally measured by \(r^2\) (see later)

๐Ÿงฎ \(t\)-Test for the Slope \(\beta_1\) (ยง11.5)

Test: \(H_0: \beta_1 = \beta_{10}\) โ€” (Most common: \(\beta_{10} = 0\))

\[T = \frac{\hat\beta_1 - \beta_{10}}{S\sqrt{1/S_{xx}}} = \frac{\hat\beta_1 - \beta_{10}}{S/\sqrt{S_{xx}}} \sim t_{n-2} \text{ under } H_0\]

Rejection regions at level \(\alpha\) (df \(= n-2\)):

\(H_a\) Rejection Region
\(\beta_1 > \beta_{10}\) \(t > t_\alpha\)
\(\beta_1 < \beta_{10}\) \(t < -t_\alpha\)
\(\beta_1 \neq \beta_{10}\) \(\|t\| > t_{\alpha/2}\)

Financial use: \(H_0: \beta_1 = 0\) tests whether \(x\) has any linear relationship with \(Y\).

๐Ÿ“Œ Example 3: Testing the Market Beta

Continuing Example 1: Test \(H_0: \beta_1 = 0\) vs \(H_a: \beta_1 \neq 0\) at \(\alpha = 0.05\).

\[S = 0.606, \quad S_{xx} = 10, \quad \hat\beta_1 = 0.7\]

\[T = \frac{0.7 - 0}{0.606/\sqrt{10}} = \frac{0.7}{0.1916} = 3.654\]

Critical value: \(t_{0.025}\) with \(\nu = n-2 = 3\) df \(= 3.182\).

Since \(|T| = 3.654 > 3.182\) โ†’ Reject \(H_0\) โœ… โ€” the market beta is significantly different from zero.

๐Ÿ“Œ Example 3: Confidence Interval

95% Confidence Interval for \(\beta_1\):

\[\hat\beta_1 \pm t_{0.025} \cdot S/\sqrt{S_{xx}} \] \[= 0.7 \pm 3.182 \times 0.1916 \] \[= 0.7 \pm 0.610\]

\[\Rightarrow (0.090, \; 1.310)\]

Interpretation: We are 95% confident the stockโ€™s market beta lies between \(0.09\) and \(1.31\). The wide interval reflects the tiny sample (\(n=5\)) โ€” real CAPM estimation uses years of weekly data.

๐Ÿงฎ Coefficient of Determination \(r^2\)

Definition โ€” \(r^2\) (Wackerly ยง11.6)

\[r^2 = 1 - \frac{\text{SSE}}{S_{yy}}\]

\[r^2 = \frac{S_{yy} - \text{SSE}}{S_{yy}} = \frac{\hat\beta_1 S_{xy}}{S_{yy}}\]

Interpretation: \(r^2\) is the proportion of total variation in \(Y\) explained by the linear relationship with \(x\).

  • \(r^2 = 1.0\) โ†’ Perfect fit
  • \(r^2 = 0.0\) โ†’ No linear fit
  • \(r^2 = 0.7\) โ†’ 70% explained

๐Ÿงฎ \(r^2\) Example

Recall Example 1:

\[r^2 = 1 - \frac{\text{SSE}}{S_{yy}} = 1 - \frac{1.1}{6} = 1 - 0.183 = 0.817\]

Interpretation: The market index explains 81.7% of the stockโ€™s return variation.

  • 81.7% of variation is systematic risk (due to market movements)
  • 18.3% of variation is idiosyncratic risk (firm-specific)

๐Ÿงฎ Pearson Correlation Coefficient \(r\)

๐Ÿ“ Pearson \(r\) โ€” Measuring Linear Association

\[r = \frac{S_{xy}}{\sqrt{S_{xx} \cdot S_{yy}}}\]

  • \(r \in [-1, 1]\)
  • \(r > 0\): positive linear relationship
  • \(r < 0\): negative linear relationship
  • \(r^2\) = coefficient of determination in simple regression

Testing \(H_0: \rho = 0\) (no linear association):

\[T = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t_{n-2} \text{ under } H_0\]

This is algebraically equivalent to the \(t\)-test for \(H_0: \beta_1 = 0\) โ€” same test, same p-value.

๐ŸŽฎ Interactive: Fit a Regression Line

Adjust the true slope and noise to see how LS estimates, \(r^2\), and the \(t\)-statistic respond.

Red = LS fitted line. Grey dashed = true population line. Observe how \(r^2\) and \(t\) change with noise and \(n\).

๐Ÿค Think-Pair-Share

๐Ÿ’ฌ Activity (5 minutes)

Scenario: An ICTA analyst is studying the relationship between broadband price (AZN/month, \(x\)) and broadband penetration (%, \(Y\)) across 10 regions of Azerbaijan.

\(\sum x_i = 320\) \(\sum Y_i = 480\) \(n = 10\)
\(\sum x_i^2 = 11{,}200\) \(\sum x_i Y_i = 14{,}400\) \(\sum Y_i^2 = 24{,}600\)

Questions:

  1. Compute \(S_{xx}\), \(S_{xy}\), \(S_{yy}\) using the shorthand formulas.

  2. Find \(\hat\beta_1\) and \(\hat\beta_0\). Interpret \(\hat\beta_1\) in context.

  3. Compute SSE and \(S^2\). Then compute \(r^2\) and interpret it.

  4. Test \(H_0: \beta_1 = 0\) vs \(H_a: \beta_1 \neq 0\) at \(\alpha = 0.05\). What is your conclusion?

โœ… Think-Pair-Share: Solution (1/4)

1. Sums of squares:

\[S_{xx} = \sum x_i^2 - \frac{(\sum x_i)^2}{n} = 11200 - \frac{320^2}{10}\]

\[= 11200 - 10240 = 960\]

\[S_{xy} = \sum x_i y_i - \frac{(\sum x_i)(\sum y_i)}{n} = 14400 - \frac{320 \times 480}{10}\]

\[= 14400 - 15360 = -960\]

\[S_{yy} = \sum y_i^2 - \frac{(\sum y_i)^2}{n} = 24600 - \frac{480^2}{10}\]

\[= 24600 - 23040 = 1560\]

โœ… Think-Pair-Share: Solution (2/5)

2a. Slope estimate:

\[\hat\beta_1 = \frac{S_{xy}}{S_{xx}} = \frac{-960}{960} = -1.0\]

Each 1 AZN price increase โ†’ 1 pp decrease in penetration.

โœ… Think-Pair-Share: Solution (3/5)

2b. Intercept estimate:

\[\hat\beta_0 = \bar{y} - \hat\beta_1 \bar{x} = 48 - (-1)(32) = 80\]

Fitted model: \(\hat{y} = 80 - 1.0 x\)

โœ… Think-Pair-Share: Solution (4/5)

3. Error variance and \(r^2\):

\[\text{SSE} = S_{yy} - \hat\beta_1 S_{xy}\]
\[= 1560 - (-1)(-960) = 600\]

\[S^2 = \frac{600}{8} = 75, \quad S = 8.66\]

\[r^2 = 1 - \frac{600}{1560} = 0.615\]

โœ… Think-Pair-Share: Solution (5/5)

4. Hypothesis test: \(H_0: \beta_1 = 0\) at \(\alpha = 0.05\)

\[T = \frac{-1.0}{8.66/\sqrt{960}} = -3.577\]

\(t_{0.025, 8} = 2.306\)

\(|T| > 2.306\) โ†’ Reject \(H_0\) โœ…

๐Ÿ’ฐ Case Study: CAPM Regression Table

CAPM Regression: MSFT ~ SPY | Rยฒ = 0.566 | S = 4.601%
Parameter Estimate Std_Error t_stat p_value
(Intercept) ฮฒโ‚€ (alpha) 0.8517 0.7807 1.091 0.283
SPY ฮฒโ‚ (market beta) 102.1916 15.3455 6.659 0.000

๐Ÿ’ฐ Case Study: CAPM Scatter Plot

Code
# Scatter with regression line and CI band
ggplot(monthly_ret, aes(x = SPY * 100, y = MSFT * 100)) +
  geom_point(colour = "steelblue", alpha = 0.7, size = 2.5) +
  geom_smooth(method = "lm", colour = "tomato",
              fill   = "tomato", alpha = 0.15, linewidth = 1.2) +
  geom_hline(yintercept = 0, colour = "#aaa", linewidth = 0.5) +
  geom_vline(xintercept = 0, colour = "#aaa", linewidth = 0.5) +
  annotate("text", x = -8, y = 12,
           label = paste0("ฮฒฬ‚โ‚ = ", round(coef(fit)[2], 3),
                          "\nrยฒ = ", round(s$r.squared, 3)),
           hjust = 0, size = 4.5, colour = "tomato") +
  labs(
    title    = "CAPM Beta: MSFT vs SPY (2021โ€“2023)",
    subtitle = "Monthly excess returns | OLS regression",
    x = "S&P 500 monthly return (%)",
    y = "MSFT monthly return (%)"
  ) +
  theme_minimal(base_size = 12)

๐Ÿ’ฐ Case Study: Key Findings

๐Ÿ“Š MSFT CAPM Regression Results (2021โ€“2023)

Estimated Parameters:

  • \(\hat\beta_1 \approx 1.10\) โ€” MSFT is slightly more volatile than the market (high-beta tech stock)

  • \(\hat\beta_0 \approx 0.3\%\)/month โ€” small positive alpha, likely insignificant

  • \(S \approx 4.5\%\) โ€” residual risk per month not explained by market

Inference:

  • \(t_{\hat\beta_1}\) large โ†’ Reject \(H_0: \beta_1 = 0\) โ€” market return significantly explains MSFT

  • \(t_{\hat\beta_0}\) small โ†’ Fail to reject \(H_0: \beta_0 = 0\) โ€” no statistically significant alpha

  • \(r^2 \approx 0.70\) โ†’ market explains ~70% of MSFTโ€™s monthly return variation

Financial Implications:

  1. Risk decomposition: ~70% systematic (market), ~30% idiosyncratic (firm-specific)

  2. Portfolio construction: \(\beta > 1\) means MSFT amplifies market swings

  3. Active management: Alpha of ~0.3%/month is not statistically different from zero โ€” consistent with market efficiency

๐Ÿ“ Quiz #1: LS Estimator Formula

A portfolio manager regresses fund returns (\(Y\)) on a factor (\(x\)) with \(n = 12\) monthly observations. She computes \(S_{xy} = 84\) and \(S_{xx} = 120\). What is \(\hat\beta_1\)?

  • \(\hat\beta_1 = S_{xy}/S_{xx} = 84/120 = 0.70\)
  • \(\hat\beta_1 = S_{xx}/S_{xy} = 120/84 = 1.429\)
  • \(\hat\beta_1 = \sqrt{S_{xy}/S_{xx}} = \sqrt{0.70} = 0.837\)
  • \(\hat\beta_1 = S_{xy} \cdot S_{xx} = 84 \times 120 = 10080\)

๐Ÿ“ Quiz #2: Degrees of Freedom for \(S^2\)

In simple linear regression with \(n = 25\) observations, the SSE is computed to be 48.0. What is the unbiased estimator \(S^2\) of the error variance \(\sigma^2\)?

  • \(S^2 = \text{SSE}/(n-2) = 48/(25-2) = 48/23 \approx 2.087\)
  • \(S^2 = 48/(25-1) = 48/24 = 2.000\)
  • \(S^2 = 48/25 = 1.920\)
  • \(S^2 = 48/(25-3) = 48/22 \approx 2.182\)

๐Ÿ“ Quiz #3: Interpreting \(r^2\)

A regression of quarterly GDP growth on broadband penetration gives \(r^2 = 0.63\). Which is the best interpretation?

  • 63% of the variation in quarterly GDP growth is explained by its linear relationship with broadband penetration
  • The correlation between GDP growth and penetration is \(0.63\)
  • GDP growth increases by 63% for each unit increase in penetration
  • The model predicts correctly 63% of the time

๐Ÿ“ Quiz #4: \(t\)-Test for Slope

For a CAPM regression with \(n = 36\) monthly returns, \(\hat\beta_1 = 1.25\), \(S = 0.042\), \(S_{xx} = 0.18\). What is the \(t\)-statistic for \(H_0: \beta_1 = 1\) (testing if beta equals 1)?

  • \(t = (1.25 - 1.00)/(0.042/\sqrt{0.18}) = 0.25/0.099 = 2.525\), df \(= 34\)
  • \(t = 1.25/(0.042/\sqrt{0.18}) = 12.63\)
  • \(t = (1.25 - 1.00)/0.042 = 5.952\)
  • \(t = 0.25 \times \sqrt{36}/0.042 = 35.7\)

๐Ÿ“ Summary

โœ… Key Takeaways

  • Model: \(Y = \beta_0 + \beta_1 x + \varepsilon\) with \(E(\varepsilon) = 0\), \(V(\varepsilon) = \sigma^2\); normality needed for \(t\)-inference

  • LS Estimators: \(\hat\beta_1 = S_{xy}/S_{xx}\), \(\hat\beta_0 = \bar Y - \hat\beta_1 \bar x\); both unbiased; \(V(\hat\beta_1) = \sigma^2/S_{xx}\)

  • Residual Variance: \(S^2 = \text{SSE}/(n-2)\) where \(\text{SSE} = S_{yy} - \hat\beta_1 S_{xy}\); uses \(n-2\) df

  • \(t\)-Test for slope: \(T = \hat\beta_1/(S/\sqrt{S_{xx}}) \sim t_{n-2}\); rejecting \(H_0: \beta_1 = 0\) means \(x\) is a significant linear predictor

  • Goodness of fit: \(r^2 = 1 - \text{SSE}/S_{yy}\) โ€” proportion of variation in \(Y\) explained by \(x\); \(r = S_{xy}/\sqrt{S_{xx} S_{yy}}\) is Pearson correlation

๐Ÿ“š Practice Problems

๐Ÿ“ Homework Problems โ€” Chapter 11 (ยง11.1โ€“11.6)

Problem 1 (LS by hand): An analyst has 6 monthly observations: \(\sum x = 24\), \(\sum Y = 18\), \(\sum x^2 = 110\), \(\sum xY = 76\), \(\sum Y^2 = 62\), \(n = 6\). Compute \(\hat\beta_0\), \(\hat\beta_1\), SSE, \(S^2\), and \(r^2\).

Problem 2 (\(t\)-test for slope): Using the results of Problem 1, test \(H_0: \beta_1 = 0\) at \(\alpha = 0.05\). Construct a 95% CI for \(\beta_1\) and interpret it in a financial context of your choice.

Problem 3 (CAPM interpretation): A regression of stock excess returns on market excess returns gives \(\hat\beta_1 = 1.35\), SE\((\hat\beta_1) = 0.21\), \(r^2 = 0.82\), \(n = 48\). Is the stock significantly different from a โ€œmarket-neutralโ€ position (\(\beta_1 = 1\))? Test at \(\alpha = 0.05\).

Problem 4 (Elasticity): Regress broadband penetration (\(Y\)%) on GDP per capita (\(x\), USD thousands) for 20 countries: \(S_{xx} = 8{,}400\), \(S_{xy} = 1{,}260\), \(S_{yy} = 320\). Find \(\hat\beta_1\), \(r^2\), and test the slope. Interpret the elasticity result for a policy audience.

๐Ÿ‘‹ Thank You!

๐Ÿ“ฌ Contact Information:

Samir Orujov, PhD

Assistant Professor

School of Business, ADA University

๐Ÿ“ง sorujov@ada.edu.az

๐Ÿข Office: D312

โฐ Office Hours: By appointment

๐Ÿ“… Next Class:

Topic: Multiple Regression & Model Diagnostics (ยง11.11โ€“11.14)

Reading: Chapter 11, Sections 11.11โ€“11.14

Preparation: Review matrix notation basics; think about what โ€œcontrolling for other variablesโ€ means

โฐ Reminders:

โœ… Complete Practice Problems 1โ€“4

โœ… Run the CAPM regression on a stock of your choice in R

โœ… Verify the equivalence of the \(t\)-test for \(\beta_1\) and the \(t\)-test for \(\rho\)

โœ… Work hard!

โ“ Questions?

๐Ÿ’ฌ Open Discussion

Key Topics for Discussion:

  • Why does the LS line always pass through \((\bar x, \bar Y)\)? What does this imply geometrically?

  • In CAPM, \(\beta_0\) represents Jensenโ€™s alpha. What would it mean statistically if \(H_0: \beta_0 = 0\) is rejected?

  • The same \(t\)-statistic tests both \(H_0: \beta_1 = 0\) and \(H_0: \rho = 0\). How can two different null hypotheses lead to the same test?

  • You compute \(r^2 = 0.95\) for a regression of GDP on broadband penetration over time. Should you be impressed? What hidden problem might this signal?