Probability Distributions
About 2490 wordsAbout 8 min
2025-08-04
Probability distributions are mathematical functions that describe the likelihood of different outcomes for a random variable. They provide a complete description of the probability structure of random phenomena and are fundamental to statistical analysis and machine learning.
Overview
Probability distributions can be classified based on the nature of the random variable: discrete (countable outcomes), continuous (uncountable outcomes within intervals), or mixed (combinations). Each distribution is characterized by its support (possible values), probability function (PMF for discrete, PDF for continuous), cumulative distribution function, parameters, and moments.
A probability distribution is a function or rule that assigns probabilities to the outcomes of a random experiment or, more generally, to the events in a sample space. Let X be a random variable, then the probability distribution of X is defined by its probability mass function (PMF) for discrete variables or probability density function (PDF) for continuous variables.
Without loss of generality, we can define the distribution of a random variable X as follows:
P(X=x)=f(x)
for discrete variables, where f(x) is the PMF, and
P(X≤x)=F(x)
for continuous variables, where F(x) is the cumulative distribution function (CDF). The PMF and PDF must satisfy the properties of non-negativity and normalization:
- For discrete variables: ∑xP(X=x)=1
- For continuous variables: ∫−∞∞f(x)dx=1
Discrete Probability Distributions
Bernoulli Distribution
Models a single trial with two possible outcomes (success/failure)
Parameters: p (probability of success), where 0≤p≤1
Support: x∈{0,1}
PMF: P(X=x)=px(1−p)1−x
Moment Calculations:
For the expected value:
E[X]=x=0∑1x⋅P(X=x)=0⋅(1−p)+1⋅p=p
For the second moment:
E[X2]=x=0∑1x2⋅P(X=x)=02⋅(1−p)+12⋅p=p
Therefore, the variance is:
V(X)=E[X2]−(E[X])2=p−p2=p(1−p)
Applications: Coin flips, binary outcomes, indicator variables
Binomial Distribution
Models the number of successes in n independent Bernoulli trials
Parameters: n (number of trials), p (success probability)
Support: x∈{0,1,2,...,n}
PMF: P(X=x)=(xn)px(1−p)n−x
Moment Calculations:
The expected value can be derived using the linearity of expectation. Since X=∑i=1nXi where Xi∼Bernoulli(p):
E[X]=E[i=1∑nXi]=i=1∑nE[Xi]=i=1∑np=np
For the variance, since the Xi are independent:
V(X)=V(i=1∑nXi)=i=1∑nV(Xi)=i=1∑np(1−p)=np(1−p)
Alternatively, we can compute directly:
E[X]=x=0∑nx(xn)px(1−p)n−x=npx=1∑n(x−1n−1)px−1(1−p)n−x=np
Applications: Quality control, survey sampling, clinical trials
Hypergeometric Distribution
Models the number of successes in n draws without replacement from a finite population
Parameters: N (population size), K (number of success states), n (number of draws)
Support: x∈{max(0,n−(N−K)),…,min(n,K)}
PMF: P(X=x)=(nN)(xK)(n−xN−K)
Moment Calculations:
For the expected value, we use indicator variables. Let Ij=1 if the j-th draw is a success, 0 otherwise. Then X=∑j=1nIj.
The probability that any particular draw is a success is P(Ij=1)=NK, so:
E[X]=E[j=1∑nIj]=j=1∑nE[Ij]=j=1∑nNK=nNK
For the variance, we need to account for the dependence between draws:
V(X)=V(j=1∑nIj)=j=1∑nV(Ij)+2j<k∑Cov(Ij,Ik)
Since V(Ij)=NK(1−NK) and Cov(Ij,Ik)=−N2(N−1)K(N−K) for j=k:
V(X)=nNK(1−NK)+n(n−1)(−N2(N−1)K(N−K))=nNKNN−K−n(n−1)N2(N−1)K(N−K)=nN2K(N−K)(1−N−1n−1)=nN2K(N−K)(N−1N−n)
Applications: Sampling without replacement, quality control, ecological studies
Poisson Distribution
Models the number of events occurring in a fixed interval
Parameters: λ (rate parameter), where λ>0
Support: x∈{0,1,2,...}
PMF: P(X=x)=x!e−λλx
Moment Calculations:
For the expected value:
E[X]=x=0∑∞x⋅x!e−λλx=e−λx=1∑∞(x−1)!λx=e−λλx=1∑∞(x−1)!λx−1
Let k=x−1:
E[X]=e−λλk=0∑∞k!λk=e−λλeλ=λ
For the second moment:
E[X2]=x=0∑∞x2⋅x!e−λλx=e−λx=1∑∞x⋅(x−1)!λx
Let k=x−1:
E[X2]=e−λk=0∑∞(k+1)⋅k!λk+1=e−λλk=0∑∞(k+1)⋅k!λk=e−λλ(k=0∑∞k⋅k!λk+k=0∑∞k!λk)=e−λλ(λeλ+eλ)=λ(λ+1)
Therefore:
V(X)=E[X2]−(E[X])2=λ(λ+1)−λ2=λ
Properties: The Poisson distribution is the limit of Binomial(n, p) as n→∞, p→0 with np=λ.
Applications: Call centers, traffic flow, radioactive decay, rare events
Continuous Probability Distributions
Normal (Gaussian) Distribution
The most important continuous distribution in statistics
Parameters: μ (mean), σ2 (variance)
Support: x∈(−∞,∞)
PDF: f(x)=σ2π1e−2σ2(x−μ)2
Moment Calculations:
For the standard normal distribution Z∼N(0,1):
The expected value is:
E[Z]=∫−∞∞z⋅2π1e−z2/2dz=0
This follows because the integrand is an odd function and the integral converges.
For the variance:
E[Z2]=∫−∞∞z2⋅2π1e−z2/2dz
Using integration by parts with u=z, dv=ze−z2/2dz:
E[Z2]=2π1[−ze−z2/2]−∞∞+2π1∫−∞∞e−z2/2dz=0+1=1
Therefore, V(Z)=E[Z2]−(E[Z])2=1−0=1.
For the general normal distribution X=μ+σZ:
E[X]=E[μ+σZ]=μ+σE[Z]=μ
V(X)=V[μ+σZ]=σ2V(Z)=σ2
Properties: Central Limit Theorem states that sums of random variables approach normality. Linear combinations of normal variables are normal.
Additivity Property: If X∼N(μ1,σ12) and Y∼N(μ2,σ22) are independent, then:
X+Y∼N(μ1+μ2,σ12+σ22)
Let X∼N(μ1,σ12) and Y∼N(μ2,σ22) be independent normal random variables.
We can write X=μ1+σ1Z1 and Y=μ2+σ2Z2, where Z1,Z2∼N(0,1) are independent standard normal variables.
Then:
X+Y=(μ1+μ2)+σ1Z1+σ2Z2
Since Z1 and Z2 are independent, the linear combination σ1Z1+σ2Z2 is also normally distributed with:
- Mean: E[σ1Z1+σ2Z2]=σ1⋅0+σ2⋅0=0
- Variance: V(σ1Z1+σ2Z2)=σ12⋅1+σ22⋅1=σ12+σ22
Therefore:
σ1Z1+σ2Z2∼N(0,σ12+σ22)
And:
X+Y=(μ1+μ2)+(σ1Z1+σ2Z2)∼N(μ1+μ2,σ12+σ22)
The MGF of X∼N(μ,σ2) is:
MX(t)=eμt+21σ2t2
For independent X and Y:
MX+Y(t)=MX(t)⋅MY(t)=eμ1t+21σ12t2⋅eμ2t+21σ22t2=e(μ1+μ2)t+21(σ12+σ22)t2
This is the MGF of N(μ1+μ2,σ12+σ22), proving the result.
Applications: Natural phenomena, measurement errors, statistical inference
Exponential Distribution
Models time between events in a Poisson process
Parameters: λ (rate parameter), where λ>0
Support: x∈[0,∞)
PDF: f(x)=λe−λx for x≥0
Moment Calculations:
For the expected value:
E[X]=∫0∞xλe−λxdx
Using integration by parts with u=x, dv=λe−λxdx:
E[X]=[−xe−λx]0∞+∫0∞e−λxdx=0+[−λ1e−λx]0∞=λ1
For the second moment:
E[X2]=∫0∞x2λe−λxdx
Using integration by parts with u=x2, dv=λe−λxdx:
E[X2]=[−x2e−λx]0∞+∫0∞2xe−λxdx=0+λ2∫0∞xλe−λxdx=λ2⋅λ1=λ22
Therefore:
V(X)=E[X2]−(E[X])2=λ22−(λ1)2=λ21
Properties: Memoryless property: P(X>s+t∣X>s)=P(X>t)
Applications: Reliability engineering, queuing theory, survival analysis
Gamma Distribution(optional)
Generalizes exponential distribution, models waiting times
Parameters: α (shape), β (rate), both >0
Support: x∈[0,∞)
PDF: f(x)=Γ(α)βαxα−1e−βx for x≥0
Moment Calculations:
The moment generating function is:
MX(t)=E[etX]=∫0∞etxΓ(α)βαxα−1e−βxdx=Γ(α)βα∫0∞xα−1e−(β−t)xdx=Γ(α)βα⋅(β−t)αΓ(α)=(β−tβ)α for t<β
Using the MGF to find moments:
E[X]=MX′(0)=αβα(β−t)−α−1t=0=αβαβ−α−1=βα
E[X2]=MX′′(0)=α(α+1)βα(β−t)−α−2t=0=β2α(α+1)
Therefore:
V(X)=E[X2]−(E[X])2=β2α(α+1)−β2α2=β2α
Properties: Sum of α independent Exponential(β) variables
Applications: Bayesian statistics, rainfall modeling, insurance
Logistic Distribution(optional)
Models growth curves and binary choice models
Parameters: μ (location), s (scale), where s>0
Support: x∈(−∞,∞)
PDF: f(x)=s(1+e−(x−μ)/s)2e−(x−μ)/s
Moment Calculations:
The cumulative distribution function is:
F(x)=1+e−(x−μ)/s1
For the standard logistic distribution where μ=0 and s=1:
f(x)=(1+e−x)2e−x
The expected value can be found using symmetry:
E[X]=∫−∞∞x⋅(1+e−x)2e−xdx
Let u=−x, then:
E[X]=∫∞−∞(−u)⋅(1+eu)2eu(−du)=∫−∞∞(−u)⋅(1+eu)2eudu
Using the identity (1+eu)2eu=(1+e−u)2e−u:
E[X]=−∫−∞∞u⋅(1+e−u)2e−udu=−E[X]
Therefore, E[X]=0.
For the variance:
E[X2]=∫−∞∞x2⋅(1+e−x)2e−xdx
Using the substitution u=1+e−x1, which gives x=ln(1−uu) and dx=u(1−u)du:
E[X2]=∫01[ln(1−uu)]2du
This integral evaluates to 3π2, so V(X)=3π2.
For the general logistic distribution X=μ+sZ where Z∼Logistic(0,1):
E[X]=μ+sE[Z]=μ
V(X)=s2V(Z)=3s2π2
Properties: Similar shape to normal distribution but with heavier tails. The difference of two Gumbel distributions follows a logistic distribution.
Applications: Logistic regression, choice modeling, growth curves
For more details on random variables and their properties, see Random Variable.
For expectation and variance calculations, see Expectation and Variance.