Expectation and Variance

About 2156 wordsAbout 7 min

2025-08-04

Expectation and variance are two fundamental concepts in probability theory that describe the central tendency and spread of a random variable's distribution.

Expected Value (Mean)

The expected value, also known as the mean or expectation, represents the average value of a random variable over many trials.

For Discrete Random Variables

\mathbb{E}[X] = \mu_X = \sum_{x} x \cdot p_X(x)

where:

$p_X(x)$ is the probability mass function (PMF)
The sum is taken over all possible values of $X$

Properties:

Linearity and Homogeneity: $\mathbb{E}[aX + b] = a\mathbb{E}[X] + b$
For two random variables: $\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$
For independent random variables: $\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]$

Proofs of Key Properties

Linearity: For $a, b \in \mathbb{R}$

\begin{aligned} \mathbb{E}[aX + b] &= \sum_{x} (ax + b) \cdot p_X(x) \\ &= a\sum_{x} x \cdot p_X(x) + b\sum_{x} p_X(x) \\ &= a\mathbb{E}[X] + b \end{aligned}

Additivity:

\begin{aligned} \mathbb{E}[X + Y] &= \sum_{x}\sum_{y} (x + y) \cdot p_{X,Y}(x,y) \\ &= \sum_{x}\sum_{y} x \cdot p_{X,Y}(x,y) + \sum_{x}\sum_{y} y \cdot p_{X,Y}(x,y) \\ &= \mathbb{E}[X] + \mathbb{E}[Y] \end{aligned}

Product for Independent Variables: If $X$ and $Y$ are independent, then $p_{X,Y}(x,y) = p_X(x)p_Y(y)$ , so:

\begin{aligned} \mathbb{E}[XY] &= \sum_{x}\sum_{y} xy \cdot p_{X,Y}(x,y) \\ &= \sum_{x}\sum_{y} xy \cdot p_X(x)p_Y(y) \\ &= \left(\sum_{x} x p_X(x)\right)\left(\sum_{y} y p_Y(y)\right) \\ &= \mathbb{E}[X]\mathbb{E}[Y] \end{aligned}

For Continuous Random Variables

\mathbb{E}[X] = \mu_X = \int_{-\infty}^{\infty} x \cdot f_X(x)dx

where:

$f_X(x)$ is the probability density function (PDF)

Variance

Variance measures how much the values of a random variable deviate from its mean.

Definition

\mathbb{V}(X) = \sigma_X^2 = \mathbb{E}[(X - \mu_X)^2] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2

For Discrete Random Variables

\mathbb{V}(X) = \sum_{x} (x - \mu_X)^2 \cdot p_X(x)

For Continuous Random Variables

\mathbb{V}(X) = \int_{-\infty}^{\infty} (x - \mu_X)^2 \cdot f_X(x)dx

Standard Deviation

The standard deviation is the square root of the variance:

\sigma_X = \sqrt{\mathbb{V}(X)}

Properties of Variance

$\mathbb{V}(X) \geq 0$
$\mathbb{V}(a) = 0$ for any constant $a$
$\mathbb{V}(aX) = a^2 \mathbb{V}(X)$
$\mathbb{V}(X + a) = \mathbb{V}(X)$
For independent random variables: $\mathbb{V}(X + Y) = \mathbb{V}(X) + \mathbb{V}(Y)$

Proofs of Variance Properties

Scaling: For $a \in \mathbb{R}$

\begin{aligned} \mathbb{V}(aX) &= \mathbb{E}[(aX - \mathbb{E}[aX])^2] \\ &= \mathbb{E}[(aX - a\mathbb{E}[X])^2] \\ &= \mathbb{E}[a^2(X - \mathbb{E}[X])^2] \\ &= a^2\mathbb{E}[(X - \mathbb{E}[X])^2] \\ &= a^2\mathbb{V}(X) \end{aligned}

Shift Invariance:

\begin{aligned} \mathbb{V}(X + a) &= \mathbb{E}[(X + a - \mathbb{E}[X + a])^2] \\ &= \mathbb{E}[(X + a - \mathbb{E}[X] - a)^2] \\ &= \mathbb{E}[(X - \mathbb{E}[X])^2] \\ &= \mathbb{V}(X) \end{aligned}

Additivity for Independent Variables: If $X$ and $Y$ are independent:

\begin{aligned} \mathbb{V}(X + Y) &= \mathbb{E}[(X + Y)^2] - (\mathbb{E}[X + Y])^2 \\ &= \mathbb{E}[X^2 + 2XY + Y^2] - (\mathbb{E}[X] + \mathbb{E}[Y])^2 \\ &= \mathbb{E}[X^2] + 2\mathbb{E}[X]\mathbb{E}[Y] + \mathbb{E}[Y^2] - \mathbb{E}[X]^2 - 2\mathbb{E}[X]\mathbb{E}[Y] - \mathbb{E}[Y]^2 \\ &= (\mathbb{E}[X^2] - \mathbb{E}[X]^2) + (\mathbb{E}[Y^2] - \mathbb{E}[Y]^2) \\ &= \mathbb{V}(X) + \mathbb{V}(Y) \end{aligned}

Examples

Example 1: Discrete Case (Die Roll)

For a fair six-sided die:

PMF: $p_X(x) = \frac{1}{6}$ for $x \in \{1, 2, 3, 4, 5, 6\}$

Expected Value:

\mathbb{E}[X] = \sum_{x=1}^{6} x \cdot \frac{1}{6} = \frac{1+2+3+4+5+6}{6} = \frac{21}{6} = 3.5

Variance:

\mathbb{E}[X^2] = \sum_{x=1}^{6} x^2 \cdot \frac{1}{6} = \frac{1+4+9+16+25+36}{6} = \frac{91}{6}

\mathbb{V}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \frac{91}{6} - (3.5)^2 = \frac{91}{6} - \frac{49}{4} = \frac{182 - 147}{12} = \frac{35}{12} \approx 2.92

Example 2: Continuous Case (Normal Distribution)

For $X \sim N(\mu, \sigma^2)$ :

PDF: $f_X(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$

Expected Value: $\mathbb{E}[X] = \mu$

Variance: $\mathbb{V}(X) = \sigma^2$

Example 3: Continuous Case (Uniform Distribution)

For $X \sim U(a, b)$ :

PDF: $f_X(x) = \frac{1}{b-a}$ for $a \leq x \leq b$

Expected Value:

\mathbb{E}[X] = \int_a^b x \cdot \frac{1}{b-a} dx = \frac{a+b}{2}

Variance:

\mathbb{V}(X) = \int_a^b \left(x - \frac{a+b}{2}\right)^2 \cdot \frac{1}{b-a} dx = \frac{(b-a)^2}{12}

Expectation of Functions of Random Variables

When we apply a function to a random variable, we obtain a new random variable. Computing the expectation of this new random variable is a fundamental problem in probability theory.

Law of the Unconscious Statistician (LOTUS)

The core principle for computing expectations of functions of random variables is the Law of the Unconscious Statistician (LOTUS). This law states that to compute $\mathbb{E}[g(X)]$ , we don't need to first find the distribution of $g(X)$ . Instead, we can work directly with the original distribution of $X$ .

Computation Formula

For a function $g: \mathbb{R} \to \mathbb{R}$ and random variable $X$ , the expectation of $g(X)$ is:

\mathbb{E}[g(X)] = \begin{cases} \sum_{x} g(x) \cdot p_X(x) & \text{(discrete)} \\ \int_{-\infty}^{\infty} g(x) \cdot f_X(x) dx & \text{(continuous)} \end{cases}

Important Properties

Linearity: $\mathbb{E}[a \cdot g(X) + b \cdot h(X)] = a\mathbb{E}[g(X)] + b\mathbb{E}[h(X)]$
Monotonicity: If $g(x) \leq h(x)$ for all $x$ , then $\mathbb{E}[g(X)] \leq \mathbb{E}[h(X)]$

Application Examples

Example 1: Expectation of Square Function For any random variable $X$ , computing $\mathbb{E}[X^2]$ :

Discrete case: $\mathbb{E}[X^2] = \sum_{x} x^2 \cdot p_X(x)$
Continuous case: $\mathbb{E}[X^2] = \int_{-\infty}^{\infty} x^2 \cdot f_X(x) dx$

This result is crucial for calculating variance: $\mathbb{V}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$

Example 2: Expectation of Exponential Function For any random variable $X$ , computing $\mathbb{E}[e^{tX}]$ :

Discrete case: $\mathbb{E}[e^{tX}] = \sum_{x} e^{tx} \cdot p_X(x)$
Continuous case: $\mathbb{E}[e^{tX}] = \int_{-\infty}^{\infty} e^{tx} \cdot f_X(x) dx$

This is the definition of the moment generating function, which has wide applications in probability theory.

Numerical Estimation Methods

When functions are complex or distributions are non-standard, analytical solutions may be difficult to obtain. In such cases, we can use Taylor series approximation for numerical estimation.

Taylor Series Approximation Method

For a random variable $X$ with mean $\mu$ and variance $\sigma^2$ , the expectation and variance of $f(X)$ can be approximated using Taylor expansion.

Approximation Derivation for Expectation:

Perform second-order Taylor expansion of $f(X)$ around $\mu$ :
$f(X) = f(\mu) + f'(\mu)(X-\mu) + \frac{f''(\mu)}{2}(X-\mu)^2 + R_2$
where $R_2$ is the remainder term.
Take expectation of both sides:
$\mathbb{E}[f(X)] = \mathbb{E}[f(\mu)] + \mathbb{E}[f'(\mu)(X-\mu)] + \mathbb{E}\left[\frac{f''(\mu)}{2}(X-\mu)^2\right] + \mathbb{E}[R_2]$
Since $f(\mu)$ , $f'(\mu)$ , and $f''(\mu)$ are constants:
$\mathbb{E}[f(X)] = f(\mu) + f'(\mu)\mathbb{E}[X-\mu] + \frac{f''(\mu)}{2}\mathbb{E}[(X-\mu)^2] + \mathbb{E}[R_2]$
Using $\mathbb{E}[X-\mu] = 0$ and $\mathbb{E}[(X-\mu)^2] = \sigma^2$ , and ignoring higher-order remainder terms:
$\mathbb{E}[f(X)] \approx f(\mu) + \frac{f''(\mu)}{2}\sigma^2$

Approximation Derivation for Variance:

Use first-order Taylor expansion (usually sufficient for variance calculation):
$f(X) \approx f(\mu) + f'(\mu)(X-\mu)$
Since $f(\mu)$ is constant, it doesn't affect variance:
$\mathbb{V}[f(X)] \approx \mathbb{V}[f'(\mu)(X-\mu)]$
Constant factors can be factored out:
$\mathbb{V}[f(X)] \approx [f'(\mu)]^2 \mathbb{V}[X-\mu]$
Since $\mathbb{V}[X-\mu] = \mathbb{V}[X] = \sigma^2$ :
$\mathbb{V}[f(X)] \approx [f'(\mu)]^2 \sigma^2$

Summary Formulas:

\begin{aligned} \mathbb{E}\left[f(X)\right] &\approx f(\mu) + f''(\mu)\frac{\sigma^2}{2} \\ \mathbb{V}\left[f(X)\right] &\approx \left(f'(\mu)\right)^2\sigma^2 \end{aligned}

Approximation Accuracy Notes

The expectation approximation uses second-order expansion, providing higher accuracy
The variance approximation uses first-order expansion; for strongly nonlinear functions, higher-order terms may be needed
When $f(X)$ is a linear function, the approximation is exact
The more concentrated the distribution of $X$ (smaller $\sigma^2$ ), the better the approximation

Covariance and Correlation

When working with multiple random variables, we often want to measure their relationship.

Covariance

\text{Cov}(X,Y) = \mathbb{E}[(X - \mu_X)(Y - \mu_Y)] = \mathbb{E}[XY] - \mathbb{E}[X]\mathbb{E}[Y]

Correlation Coefficient

\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y}

Properties:

$-1 \leq \rho_{X,Y} \leq 1$
$\rho = 1$ : Perfect positive linear relationship
$\rho = -1$ : Perfect negative linear relationship
$\rho = 0$ : No linear relationship (but may have non-linear relationship)

Common Distributions and Their Moments

Distribution	Expected Value	Variance
Bernoulli(p)	$p$	$p(1-p)$
Binomial(n,p)	$np$	$np(1-p)$
Poisson(λ)	$\lambda$	$\lambda$
Uniform(a,b)	$\frac{a+b}{2}$	$\frac{(b-a)^2}{12}$
Normal(μ,σ²)	$\mu$	$\sigma^2$
Exponential(λ)	$\frac{1}{\lambda}$	$\frac{1}{\lambda^2}$

Important Theorems

Law of Large Numbers

For i.i.d. random variables $X_1, X_2, ..., X_n$ with mean $\mu$ :

\frac{1}{n}\sum_{i=1}^{n} X_i \xrightarrow{P} \mu \text{ as } n \to \infty

Central Limit Theorem

For i.i.d. random variables with mean $\mu$ and variance $\sigma^2$ :

\frac{\sum_{i=1}^{n} X_i - n\mu}{\sigma\sqrt{n}} \xrightarrow{D} N(0,1) \text{ as } n \to \infty

Expectation with Multiple Random Variables

When working with functions of multiple random variables, we need to understand how to compute their expectations.

Expectation of Functions of Multiple Variables

For a function $g(X,Y)$ of two random variables, the expectation is computed using the joint distribution:

\mathbb{E}[g(X,Y)] = \begin{cases} \sum_{x}\sum_{y} g(x,y) \cdot p_{X,Y}(x,y) & \text{(discrete)} \\ \iint_{\mathbb{R}^2} g(x,y) \cdot f_{X,Y}(x,y) dx dy & \text{(continuous)} \end{cases}

Key Properties

From this definition, we derive important properties:

Linearity: $\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$ (always holds)
Products: $\mathbb{E}[XY] = \mathbb{E}[X]\mathbb{E}[Y]$ (holds only when X and Y are independent)

Computing Expectations from Joint Distributions

Geometric Interpretation for Continuous Case

For a joint probability density function $f(x,y)$ , computing $\mathbb{E}[X]$ involves integrating over the entire plane:

\mathbb{E}[X] = \iint_{\mathbb{R}^2} x \cdot f(x,y) dx dy

This can be understood geometrically as finding the "center of mass" in the x-direction of the 3D surface formed by the joint density.

The computation can be done in two equivalent ways:

Direct integration: Integrate $x \cdot f(x,y)$ over the entire plane
Using marginal density: First find $f_X(x) = \int_{-\infty}^{\infty} f(x,y) dy$ , then compute $\mathbb{E}[X] = \int_{-\infty}^{\infty} x \cdot f_X(x) dx$

The second approach works because:

\mathbb{E}[X] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x \cdot f(x,y) dy dx = \int_{-\infty}^{\infty} x \left(\int_{-\infty}^{\infty} f(x,y) dy\right) dx = \int_{-\infty}^{\infty} x \cdot f_X(x) dx

Connection to Discrete Case

Similarly, for discrete random variables:

\mathbb{E}[X] = \sum_{x}\sum_{y} x \cdot p_{X,Y}(x,y) = \sum_{x} x \left(\sum_{y} p_{X,Y}(x,y)\right) = \sum_{x} x \cdot p_X(x)

This shows that whether we work with joint distributions directly or first compute marginal distributions, we arrive at the same expectation.

Conditional Expectation

The conditional expectation of $Y$ given $X = x$ is:

\mathbb{E}[Y|X = x] = \begin{cases} \sum_{y} y \cdot p_{Y|X}(y|x) & \text{(discrete)} \\ \int_{-\infty}^{\infty} y \cdot f_{Y|X}(y|x) dy & \text{(continuous)} \end{cases}

This leads to the law of total expectation:

\mathbb{E}[Y] = \mathbb{E}[\mathbb{E}[Y|X]]