Expectation and Variance
About 2156 wordsAbout 7 min
2025-08-04
Expectation and variance are two fundamental concepts in probability theory that describe the central tendency and spread of a random variable's distribution.
Expected Value (Mean)
The expected value, also known as the mean or expectation, represents the average value of a random variable over many trials.
For Discrete Random Variables
E[X]=μX=x∑x⋅pX(x)
where:
- pX(x) is the probability mass function (PMF)
- The sum is taken over all possible values of X
Properties:
- Linearity and Homogeneity: E[aX+b]=aE[X]+b
- For two random variables: E[X+Y]=E[X]+E[Y]
- For independent random variables: E[XY]=E[X]E[Y]
Proofs of Key Properties
Linearity: For a,b∈R
E[aX+b]=x∑(ax+b)⋅pX(x)=ax∑x⋅pX(x)+bx∑pX(x)=aE[X]+b
Additivity:
E[X+Y]=x∑y∑(x+y)⋅pX,Y(x,y)=x∑y∑x⋅pX,Y(x,y)+x∑y∑y⋅pX,Y(x,y)=E[X]+E[Y]
Product for Independent Variables: If X and Y are independent, then pX,Y(x,y)=pX(x)pY(y), so:
E[XY]=x∑y∑xy⋅pX,Y(x,y)=x∑y∑xy⋅pX(x)pY(y)=(x∑xpX(x))(y∑ypY(y))=E[X]E[Y]
For Continuous Random Variables
E[X]=μX=∫−∞∞x⋅fX(x)dx
where:
- fX(x) is the probability density function (PDF)
Variance
Variance measures how much the values of a random variable deviate from its mean.
Definition
V(X)=σX2=E[(X−μX)2]=E[X2]−(E[X])2
For Discrete Random Variables
V(X)=x∑(x−μX)2⋅pX(x)
For Continuous Random Variables
V(X)=∫−∞∞(x−μX)2⋅fX(x)dx
Standard Deviation
The standard deviation is the square root of the variance:
σX=V(X)
Properties of Variance
- V(X)≥0
- V(a)=0 for any constant a
- V(aX)=a2V(X)
- V(X+a)=V(X)
- For independent random variables: V(X+Y)=V(X)+V(Y)
Proofs of Variance Properties
Scaling: For a∈R
V(aX)=E[(aX−E[aX])2]=E[(aX−aE[X])2]=E[a2(X−E[X])2]=a2E[(X−E[X])2]=a2V(X)
Shift Invariance:
V(X+a)=E[(X+a−E[X+a])2]=E[(X+a−E[X]−a)2]=E[(X−E[X])2]=V(X)
Additivity for Independent Variables: If X and Y are independent:
V(X+Y)=E[(X+Y)2]−(E[X+Y])2=E[X2+2XY+Y2]−(E[X]+E[Y])2=E[X2]+2E[X]E[Y]+E[Y2]−E[X]2−2E[X]E[Y]−E[Y]2=(E[X2]−E[X]2)+(E[Y2]−E[Y]2)=V(X)+V(Y)
Examples
Example 1: Discrete Case (Die Roll)
For a fair six-sided die:
- PMF: pX(x)=61 for x∈{1,2,3,4,5,6}
Expected Value:
E[X]=x=1∑6x⋅61=61+2+3+4+5+6=621=3.5
Variance:
E[X2]=x=1∑6x2⋅61=61+4+9+16+25+36=691
V(X)=E[X2]−(E[X])2=691−(3.5)2=691−449=12182−147=1235≈2.92
Example 2: Continuous Case (Normal Distribution)
For X∼N(μ,σ2):
- PDF: fX(x)=σ2π1e−2σ2(x−μ)2
Expected Value: E[X]=μ
Variance: V(X)=σ2
Example 3: Continuous Case (Uniform Distribution)
For X∼U(a,b):
- PDF: fX(x)=b−a1 for a≤x≤b
Expected Value:
E[X]=∫abx⋅b−a1dx=2a+b
Variance:
V(X)=∫ab(x−2a+b)2⋅b−a1dx=12(b−a)2
Expectation of Functions of Random Variables
When we apply a function to a random variable, we obtain a new random variable. Computing the expectation of this new random variable is a fundamental problem in probability theory.
Law of the Unconscious Statistician (LOTUS)
The core principle for computing expectations of functions of random variables is the Law of the Unconscious Statistician (LOTUS). This law states that to compute E[g(X)], we don't need to first find the distribution of g(X). Instead, we can work directly with the original distribution of X.
Computation Formula
For a function g:R→R and random variable X, the expectation of g(X) is:
E[g(X)]={∑xg(x)⋅pX(x)∫−∞∞g(x)⋅fX(x)dx(discrete)(continuous)
Important Properties
- Linearity: E[a⋅g(X)+b⋅h(X)]=aE[g(X)]+bE[h(X)]
- Monotonicity: If g(x)≤h(x) for all x, then E[g(X)]≤E[h(X)]
Application Examples
Example 1: Expectation of Square Function For any random variable X, computing E[X2]:
- Discrete case: E[X2]=∑xx2⋅pX(x)
- Continuous case: E[X2]=∫−∞∞x2⋅fX(x)dx
This result is crucial for calculating variance: V(X)=E[X2]−(E[X])2
Example 2: Expectation of Exponential Function For any random variable X, computing E[etX]:
- Discrete case: E[etX]=∑xetx⋅pX(x)
- Continuous case: E[etX]=∫−∞∞etx⋅fX(x)dx
This is the definition of the moment generating function, which has wide applications in probability theory.
Numerical Estimation Methods
When functions are complex or distributions are non-standard, analytical solutions may be difficult to obtain. In such cases, we can use Taylor series approximation for numerical estimation.
Taylor Series Approximation Method
For a random variable X with mean μ and variance σ2, the expectation and variance of f(X) can be approximated using Taylor expansion.
Approximation Derivation for Expectation:
Perform second-order Taylor expansion of f(X) around μ:
f(X)=f(μ)+f′(μ)(X−μ)+2f′′(μ)(X−μ)2+R2
where R2 is the remainder term.
Take expectation of both sides:
E[f(X)]=E[f(μ)]+E[f′(μ)(X−μ)]+E[2f′′(μ)(X−μ)2]+E[R2]
Since f(μ), f′(μ), and f′′(μ) are constants:
E[f(X)]=f(μ)+f′(μ)E[X−μ]+2f′′(μ)E[(X−μ)2]+E[R2]
Using E[X−μ]=0 and E[(X−μ)2]=σ2, and ignoring higher-order remainder terms:
E[f(X)]≈f(μ)+2f′′(μ)σ2
Approximation Derivation for Variance:
Use first-order Taylor expansion (usually sufficient for variance calculation):
f(X)≈f(μ)+f′(μ)(X−μ)
Since f(μ) is constant, it doesn't affect variance:
V[f(X)]≈V[f′(μ)(X−μ)]
Constant factors can be factored out:
V[f(X)]≈[f′(μ)]2V[X−μ]
Since V[X−μ]=V[X]=σ2:
V[f(X)]≈[f′(μ)]2σ2
Summary Formulas:
E[f(X)]V[f(X)]≈f(μ)+f′′(μ)2σ2≈(f′(μ))2σ2
Approximation Accuracy Notes
- The expectation approximation uses second-order expansion, providing higher accuracy
- The variance approximation uses first-order expansion; for strongly nonlinear functions, higher-order terms may be needed
- When f(X) is a linear function, the approximation is exact
- The more concentrated the distribution of X (smaller σ2), the better the approximation
Covariance and Correlation
When working with multiple random variables, we often want to measure their relationship.
Covariance
Cov(X,Y)=E[(X−μX)(Y−μY)]=E[XY]−E[X]E[Y]
Correlation Coefficient
ρX,Y=σXσYCov(X,Y)
Properties:
- −1≤ρX,Y≤1
- ρ=1: Perfect positive linear relationship
- ρ=−1: Perfect negative linear relationship
- ρ=0: No linear relationship (but may have non-linear relationship)
Common Distributions and Their Moments
Distribution | Expected Value | Variance |
---|---|---|
Bernoulli(p) | p | p(1−p) |
Binomial(n,p) | np | np(1−p) |
Poisson(λ) | λ | λ |
Uniform(a,b) | 2a+b | 12(b−a)2 |
Normal(μ,σ²) | μ | σ2 |
Exponential(λ) | λ1 | λ21 |
Important Theorems
Law of Large Numbers
For i.i.d. random variables X1,X2,...,Xn with mean μ:
n1i=1∑nXiPμ as n→∞
Central Limit Theorem
For i.i.d. random variables with mean μ and variance σ2:
σn∑i=1nXi−nμDN(0,1) as n→∞
Expectation with Multiple Random Variables
When working with functions of multiple random variables, we need to understand how to compute their expectations.
Expectation of Functions of Multiple Variables
For a function g(X,Y) of two random variables, the expectation is computed using the joint distribution:
E[g(X,Y)]={∑x∑yg(x,y)⋅pX,Y(x,y)∬R2g(x,y)⋅fX,Y(x,y)dxdy(discrete)(continuous)
Key Properties
From this definition, we derive important properties:
- Linearity: E[X+Y]=E[X]+E[Y] (always holds)
- Products: E[XY]=E[X]E[Y] (holds only when X and Y are independent)
Computing Expectations from Joint Distributions
Geometric Interpretation for Continuous Case
For a joint probability density function f(x,y), computing E[X] involves integrating over the entire plane:
E[X]=∬R2x⋅f(x,y)dxdy
This can be understood geometrically as finding the "center of mass" in the x-direction of the 3D surface formed by the joint density.
The computation can be done in two equivalent ways:
- Direct integration: Integrate x⋅f(x,y) over the entire plane
- Using marginal density: First find fX(x)=∫−∞∞f(x,y)dy, then compute E[X]=∫−∞∞x⋅fX(x)dx
The second approach works because:
E[X]=∫−∞∞∫−∞∞x⋅f(x,y)dydx=∫−∞∞x(∫−∞∞f(x,y)dy)dx=∫−∞∞x⋅fX(x)dx
Connection to Discrete Case
Similarly, for discrete random variables:
E[X]=x∑y∑x⋅pX,Y(x,y)=x∑x(y∑pX,Y(x,y))=x∑x⋅pX(x)
This shows that whether we work with joint distributions directly or first compute marginal distributions, we arrive at the same expectation.
Conditional Expectation
The conditional expectation of Y given X=x is:
E[Y∣X=x]={∑yy⋅pY∣X(y∣x)∫−∞∞y⋅fY∣X(y∣x)dy(discrete)(continuous)
This leads to the law of total expectation:
E[Y]=E[E[Y∣X]]