Information Communication Technologies Agency, Statistics Unit
2025-12-06
๐ฏ Learning Objectives
By the end of this lecture, you will be able to:
Define the gamma distribution and identify its shape parameter (\(\alpha\)) and scale parameter (\(\beta\)), and understand how they affect distribution characteristics
Compute probabilities, expected values, and variances for gamma-distributed random variables, including exponential and chi-square special cases
Apply the exponential distribution to model waiting times and understand the memoryless property in reliability and financial risk contexts
Use the chi-square distribution (\(\chi^2\)) for statistical inference, hypothesis testing, and volatility modeling in finance
Solve real-world problems involving skewed distributions in insurance claims, component lifetimes, income distributions, and water demand forecasting
๐ Overview
๐ Topics Covered Today
Skewed Distributions โ Understanding asymmetric data patterns and when they arise
Gamma Distribution โ The general form with shape and scale parameters
Chi-Square Distribution โ A special case with applications in hypothesis testing and variance estimation
Exponential Distribution โ Modeling lifetimes and the memoryless property
Applications โ Insurance claims, reliability engineering, water demand, income distributions, and financial risk modeling
๐ Definition: Skewed Distribution
๐ Concept: Skewed Distributions
A skewed distribution occurs when data in a chart lean either to the left or right side of the scale, resulting in a nonsymmetrical curve.
Key Characteristics:
Asymmetry: The left side is shaped differently than the right side
Tail behavior: One tail is longer than the other, indicating more extreme values on that side
Difference from normal: Unlike the Gaussian (normal) distribution, which is symmetric around the mean with zero skewness, skewed distributions have the mean, median, and mode at different locations
Real-world prevalence: Many phenomena exhibit skewness, including income distributions, insurance claims, component lifetimes, and asset returns
Financial Context: Income distributions are typically right-skewed (positively skewed) with a long tail extending toward higher incomes, while most people cluster at lower to moderate income levels .
๐ Types of Skewness
Right-Skewed (Positive Skew)
Tail extends to the right
Mean > Median > Mode
Common in: income, insurance claims, asset returns
Examples:
Executive compensation
Insurance loss amounts
Real estate prices
Time to equipment failure
Left-Skewed (Negative Skew)
Tail extends to the left
Mean < Median < Mode
Common in: age at death, test scores with ceiling effects
Examples:
Human lifespan (peaks at 75-80 years)
Product quality ratings (most cluster at high ratings)
Time remaining until retirement
๐ Example 1: Human Lifespan Distribution
Problem: The average human life span chart skews left. If the chart shows values from 1 to 100 (representing years of life), explain the distribution characteristics.
Analysis:
The data shows that most people live to around 75 to 80 years old, which means:
Peak location: The distributionโs peak (mode) is closer to the right of the chart (near 75-80 years)
Tail direction: The chartโs tail is longer on its left side because the values around 75 and 80 are closer to 100 than to 1
Asymmetry: Fewer people die at very young ages (due to modern medicine and sanitation), creating a shorter right tail, while infant mortality and premature deaths create a longer left tail
Interpretation: This left-skewed pattern reflects that in developed countries, medical advances have pushed most deaths toward older ages, with exceptional cases of early death creating the left tail. In contrast, thereโs a biological upper limit on lifespan, creating a natural boundary on the right side.
๐ Definition: Gamma Probability Distribution
๐ Definition 1: Gamma Distribution
A random variable \(Y\) is said to have a gamma distribution with parameters \(\alpha > 0\) and \(\beta > 0\) if and only if the probability density function (pdf) of \(Y\) is:
\(\Gamma(\alpha) = (\alpha - 1)\Gamma(\alpha - 1)\) for any \(\alpha > 1\) (recursive property)
\(\Gamma(n) = (n - 1)!\) for positive integers \(n\)
๐ Understanding Gamma Parameters
๐ Shape Parameter: \(\alpha\)
Effect on Distribution:
Controls the shape of the distribution
\(\alpha < 1\): J-shaped (decreasing from infinity at \(y = 0\))
\(\alpha = 1\): Exponential distribution
\(\alpha > 1\): Unimodal with peak shifting right as \(\alpha\) increases
Larger \(\alpha\) makes distribution more symmetric (approaches normal)
Financial Application: In reliability engineering, \(\alpha\) represents the number of stages or phases before failure .
๐ Scale Parameter: \(\beta\)
Effect on Distribution:
Controls the scale or spread of the distribution
Larger \(\beta\) stretches distribution to the right
Smaller \(\beta\) compresses distribution toward zero
Does not affect shape, only the x-axis scaling
Financial Application: In insurance, \(\beta\) scales the claim amounts while maintaining the underlying claim frequency pattern represented by \(\alpha\) .
๐งฎ Theorem: Mean and Variance of Gamma Distribution
Theorem 1: Expected Value and Variance
If \(Y\) has a gamma distribution with parameters \(\alpha\) and \(\beta\), then:
\[\boxed{\mu = E(Y) = \alpha \beta}\]
and
\[\boxed{\sigma^2 = V(Y) = \alpha \beta^2}\]
Derivation Insight: These formulas follow from integration by parts applied to the definition of expected value and variance using the gamma density function.
Important Note: Except when \(\alpha = 1\) (exponential distribution), it is generally impossible to obtain areas under the gamma density function by direct integration. We typically use:
Statistical software (R, Python)
Numerical integration methods
Tables for chi-square distribution (when applicable)
Online calculators/applets
๐ฎ Interactive: Gamma Distribution Explorer
Explore Gamma Parameters: Adjust \(\alpha\) (shape) and \(\beta\) (scale) to see their effects on the distribution.
\[\boxed{P(Y < 5) \approx 0.7127 \text{ or } 71.27\%}\]
Interpretation: About 71% of the time, the computer responds within 5 seconds.
๐ค Think-Pair-Share: IT Infrastructure Planning
05:00
๐ญ Student Engagement Activity (5 minutes)
Scenario: You are an IT manager at a financial services company. Server response times follow a gamma distribution with mean \(\mu = 3\) seconds and standard deviation \(\sigma = 2\) seconds. The companyโs service level agreement (SLA) requires that 90% of requests complete within 6 seconds.
Think (1 minute): Work individually
Calculate the parameters \(\alpha\) and \(\beta\) for this gamma distribution
Does the current system meet the SLA requirement? (Use the fact that \(P(Y < 6) \approx 0.85\) for these parameters)
What business impact occurs if the SLA is violated?
Pair (2-3 minutes): Discuss with a partner
Compare your parameter calculations
Discuss whether system upgrades are needed
Consider the trade-off between upgrade costs and SLA compliance
Share (1-2 minutes): Class discussion
Selected pairs share their recommendations
Discuss how modeling response times helps capacity planning and investment decisions
๐ Definition: Chi-Square Distribution
๐ Definition 2: Chi-Square (\(\chi^2\)) Distribution
Let \(\nu\) be a positive integer. A random variable \(Y\) is said to have a chi-square distribution with \(\nu\)degrees of freedom if and only if \(Y\) is a gamma-distributed random variable with parameters:
Notation: We write \(Y \sim \chi^2_\nu\) to denote that \(Y\) has a chi-square distribution with \(\nu\) degrees of freedom.
The pdf becomes:\[f(y) = \begin{cases}
\frac{y^{\nu/2-1}e^{-y/2}}{2^{\nu/2}\Gamma(\nu/2)}, & 0 \leq y < \infty \\
0, & \text{elsewhere}
\end{cases}\]
Theorem 2: Mean and Variance of Chi-Square Distribution
If \(Y \sim \chi^2_\nu\), then: \(\boxed{\mu = E(Y) = \nu}\) and \(\boxed{\sigma^2 = V(Y) = 2\nu}\)
๐ Relationship: Gamma to Chi-Square
๐ Converting Gamma to Chi-Square
Key Result: If \(Y\) has a gamma distribution with \(\alpha = \frac{n}{2}\) for some integer \(n\), then:
\[\frac{2Y}{\beta} \sim \chi^2_n\]
has a chi-square distribution with \(n\) degrees of freedom.
Why This Matters: Chi-square distributions have extensive tables and computational support, making them easier to work with than general gamma distributions.
๐ Example 3: Converting Gamma to Chi-Square
Problem: If \(Y\) has a gamma distribution with \(\alpha = 1.5 = \frac{3}{2}\) and \(\beta = 4\), find \(P(Y < 3.5)\) using the chi-square distribution.
Solution:
Since \(\alpha = \frac{3}{2}\), we can use the relationship:
Note: This is a special case of the gamma distribution with \(\alpha = 1\).
Key Properties:
Memoryless property: The probability of an event occurring in the future is independent of how much time has already elapsed
Single parameter: Only \(\beta\) determines both shape and scale
Common applications: Time between arrivals, component lifetimes, time until failure
Theorem 3: Mean and Variance of Exponential Distribution
If \(Y\) has an exponential distribution with parameter \(\beta\), then: \(\boxed{\mu = E(Y) = \beta}\) and \(\boxed{\sigma^2 = V(Y) = \beta^2}\)
๐ The Memoryless Property
๐ Memoryless Property of Exponential Distribution
Definition: A random variable \(Y\) is memoryless if:
\[P(Y > a + b \mid Y > a) = P(Y > b) \; \text{for all}\; a > 0 \; \text{and} \; b > 0.\]
Interpretation:
If a component has already lasted \(a\) time units, the probability it lasts an additional \(b\) time units is the same as if it were brand new
The past does not affect future probabilities
Example: A fuse that hasnโt blown after 100 hours is just as likely to last another 50 hours as a new fuse is to last 50 hours
Mathematical Proof: Using the definition of conditional probability:
\[P(Y > a + b \mid Y > a) = \frac{P(Y > a + b)}{P(Y > a)} = \frac{e^{-(a+b)/\beta}}{e^{-a/\beta}} = e^{-b/\beta} = P(Y > b)\]
๐ Example 4: Memoryless Property Verification
Problem: Suppose that \(Y\) has an exponential probability density function with parameter \(\beta\). Show that if \(a > 0\) and \(b > 0\):
\[P(Y > a + b \mid Y > a) = P(Y > b)\]
Solution:
From the definition of conditional probability: \[P(Y > a + b \mid Y > a) = \frac{P(Y > a + b \cap Y > a)}{P(Y > a)}\]
Since \((Y > a + b) \cap (Y > a) = (Y > a + b)\): \[P(Y > a + b \mid Y > a) = \frac{P(Y > a + b)}{P(Y > a)}\]
๐ Example 4: Solution (continued)
Computing the probabilities:
\[P(Y > a + b) = \int_{a+b}^{\infty} \frac{1}{\beta}e^{-y/\beta} \,dy = -e^{-y/\beta} \Big|_{a+b}^{\infty} = e^{-(a+b)/\beta}\]
Similarly: \[P(Y > a) = \int_{a}^{\infty} \frac{1}{\beta}e^{-y/\beta} \,dy = e^{-a/\beta}\]
Therefore: \[P(Y > a + b \mid Y > a) = \frac{e^{-(a+b)/\beta}}{e^{-a/\beta}} = e^{-b/\beta} = P(Y > b) \quad \blacksquare\]
Financial Interpretation: In modeling default times for credit risk, the memoryless property implies that a bond that hasnโt defaulted so far is as likely to default in the next period as it was initiallyโwhich may not be realistic, motivating more complex models .
๐ฎ Interactive: Exponential Distribution
Explore Exponential Distribution: Adjust \(\beta\) to see how it affects the distribution and memoryless property.
Code
viewof beta_exp = Inputs.range([1,10], {value:2,step:0.5,label:"ฮฒ (mean & scale):"})mean_exp = beta_expvariance_exp = beta_exp * beta_expsd_exp = beta_expmd`**Exponential Parameters:** ฮฒ = ${beta_exp.toFixed(1)}**Statistics:** Mean = ${mean_exp.toFixed(2)}Variance = ${variance_exp.toFixed(2)}Std Dev = ${sd_exp.toFixed(2)}**Note:** This is Gamma(ฮฑ=1, ฮฒ=${beta_exp.toFixed(1)})`
Problem: The operator of a pumping station has observed that demand for water during early afternoon hours has an approximately exponential distribution with mean 100 cfs (cubic feet per second).
Find the probability that the demand will exceed 200 cfs during the early afternoon on a randomly selected day.
What water-pumping capacity should the station maintain during early afternoons so that the probability that demand will exceed capacity on a randomly selected day is only 0.01?
Solution (Part a):
Since the mean of an exponential random variable with parameter \(\beta\) equals \(\beta\), we have \(\beta = 100\).
The pdf is: \[f(y) = \begin{cases}
\frac{1}{100} e^{-y/100}, & 0 \leq y < \infty \\
0, & \text{elsewhere}
\end{cases}\]
Business Recommendation: The pumping station should maintain a capacity of at least 461 cfs to ensure that demand exceeds capacity on only 1% of days (approximately 3-4 days per year).
Cost-Benefit Analysis: This capacity provides high reliability (99% service level) while avoiding over-investment in excessive capacity that would rarely be needed. The station manager can balance the cost of additional capacity against the cost of water shortages.
๐ฐ Case Study: Insurance Claim Amounts (Real Data)
๐ Fire Insurance Loss Modeling
Context: Insurance companies model loss amounts using gamma distributions because losses are naturally right-skewed with a long tail of catastrophic events. We analyze Danish fire insurance losses from 1980-1990.
Key Questions:
What are the optimal shape (\(\alpha\)) and scale (\(\beta\)) parameters using MLE?
What proportion of losses exceed certain thresholds (e.g., 10M, 20M DKK)?
How well does the gamma model fit, and what are its limitations?
๐ Data Source
We analyze Danish fire insurance losses (1980-1990) - a classic actuarial dataset.
Source: R package fitdistrplus - danishuni dataset
Sample Size: 2,167 individual fire insurance claims
Data Type: Loss amounts in millions of Danish Krone (DKK)
Application: Widely used for demonstrating actuarial modeling, extreme value analysis, and heavy-tailed distributions
๐ฐ Case Study: Data Loading and Parameter Estimation
Code
# Load required librarieslibrary(tidyverse)library(fitdistrplus) # For MLE fitting# Load real insurance loss data from fitdistrplus package# Danish fire insurance losses (1980-1990): 2,167 claims# This is a classic actuarial dataset for gamma distribution modelingdata(danishuni)# Extract loss amounts (in millions of Danish Krone)claims <- danishuni$Loss# Verify we have sufficient dataif (length(claims) <100) {stop(paste("Error: Only", length(claims), "claims loaded. Need at least 100."))}cat("Data source: Danish fire insurance losses (1980-1990)\n")
Data source: Danish fire insurance losses (1980-1990)
Code
cat("Original values in millions of Danish Krone\n")
Reserve Estimation: The gamma model with \(\alpha \approx 1.3\) captures realistic fire insurance loss patterns: most claims are moderate, with a long tail for major fires requiring substantial reserves.
๐ Quiz #1: Gamma Distribution Parameters
For a gamma distribution with \(\alpha = 3\) and \(\beta = 5\), what is the mean?
\(\mu = 15\)
\(\mu = 8\)
\(\mu = 75\)
\(\mu = 3\)
๐ Quiz #2: Chi-Square Relationship
A random variable \(Y\) has a gamma distribution with \(\alpha = 4\) and \(\beta = 2\). What distribution does \(\frac{2Y}{\beta} = Y\) follow?
\(\chi^2\) with 8 degrees of freedom
\(\chi^2\) with 4 degrees of freedom
\(\chi^2\) with 2 degrees of freedom
Exponential with parameter 2
๐ Quiz #3: Exponential Distribution
Which property uniquely characterizes the exponential distribution among continuous distributions?
The memoryless property: P(Y > a + b | Y > a) = P(Y > b)
It has mean equal to variance
It is symmetric around its mean
It has bounded support
๐ Quiz #4: Variance Formula
If a gamma-distributed random variable has \(\alpha = 2\) and variance \(\sigma^2 = 32\), what is the scale parameter \(\beta\)?
\(\beta = 4\)
\(\beta = 16\)
\(\beta = 8\)
\(\beta = 2\)
๐ Summary
โ Key Takeaways
The gamma distribution is a flexible two-parameter family (\(\alpha\) and \(\beta\)) that models right-skewed, non-negative continuous data such as waiting times, claim amounts, and component lifetimes
Special cases include the exponential distribution (\(\alpha = 1\)) and chi-square distribution (\(\alpha = \nu/2\), \(\beta = 2\)), each with specific applications in reliability, queuing, and statistical inference
The exponential distribution possesses the unique memoryless property, making it suitable for modeling random arrivals and component failures where past history doesnโt affect future probabilities
Mean and variance formulas (\(\mu = \alpha\beta\) and \(\sigma^2 = \alpha\beta^2\)) allow parameter estimation from sample data using method of moments or maximum likelihood
Financial applications include insurance loss modeling, credit risk assessment, reliability engineering, and income distribution analysis, where the gamma family captures realistic skewness and tail behavior
๐ Practice Problems
๐ Homework Problems
Problem 1 (Insurance Claims): An insurance company models claim amounts using a gamma distribution with mean $15,000 and variance $112,500,000. Find: (a) the parameters \(\alpha\) and \(\beta\); (b) the probability a claim exceeds $30,000; (c) the 95th percentile of claim amounts for reserve planning.
Problem 2 (System Reliability): The lifetime of a critical server component follows an exponential distribution with mean 5000 hours. (a) What is the probability the component fails before 3000 hours? (b) Given it has already operated for 2000 hours, what is the probability it operates for at least 4000 additional hours? (c) Compare this to a non-memoryless distributionโs behavior.
Problem 3 (Hypothesis Testing): In testing whether a sample variance differs from a population variance, we use the chi-square distribution. If a sample of size \(n = 25\) has variance \(s^2 = 36\) and weโre testing against a hypothesized \(\sigma^2 = 25\), calculate the chi-square test statistic \(\chi^2 = \frac{(n-1)s^2}{\sigma^2}\) and find the probability of observing a value this extreme or more.
Problem 4 (Water Resources): Daily water consumption in a district follows a gamma distribution with \(\alpha = 3\) and \(\beta = 200\) thousand gallons. What capacity should be maintained to ensure demand is met 99% of days?
Topic: Sampling Distributions and the Central Limit Theorem
Reading: Chapter 8 (textbook sections on sampling distributions)
Preparation: Review properties of sums of random variables and convolution
โฐ Reminders:
โ Complete Practice Problems 1-4
โ Explore online gamma/chi-square calculators
โ Review integration by parts technique
โ Work hard!
โ Questions?
๐ฌ Open Discussion (5 minutes)
Key Topics for Discussion:
How do insurance companies use gamma distributions to set premiums that balance competitiveness with profitability and solvency requirements?
What are the limitations of the memoryless property assumption in financial modeling, and when might alternative distributions (Weibull, log-normal) be more appropriate?
How does the chi-square distribution connect to hypothesis testing for variance and goodness-of-fit tests in econometrics?
In reliability engineering, how do gamma distributions with different shape parameters model different failure mechanisms (wear-out vs. random failure)?