Random Variable
About 1635 wordsAbout 5 min
2025-08-04
Definition of Random Variable
To some extend, probability theory and statistics are two sides of the same coin. Probability theory provides the mathematical framework for modeling random phenomena, while statistics provides the tools for analyzing data collected from those phenomena. This brings us to the juncture of the two fields; Random variables are the bridge that connects probability theory and statistics. They allow us to quantify the outcomes of random processes and analyze them statistically, which is beneficial and crucial for both fields.
A random variable is a variable whose possible values are numerical outcomes of a random phenomenon. There are two main types of random variables: discrete and continuous.
Discrete Random Variables: These are random variables that can take on a countable number of values. For example, the number of heads in 10 coin flips is a discrete random variable.
Continuous Random Variables: These are random variables that can take on an infinite number of values within a given range. For example, the time it takes for a computer to solve a problem is a continuous random variable.
Formally, a random variable is a measurable function that maps outcomes of a random process to real numbers. This mapping allows us to assign probabilities to different outcomes and analyze them statistically using existing mathematical tools.
Let Ω be the sample space of a random process, and let X:Ω→R be a random variable. The function X assigns a real number to each outcome in Ω. The probability distribution of a random variable describes how the probabilities are distributed over the possible values of the random variable.
Probability Functions
For discrete random variables, we use the Probability Mass Function (PMF):
pX(x)=P(X=x)
Properties:
- pX(x)≥0 for all x
- ∑xpX(x)=1
For continuous random variables, we use the Probability Density Function (PDF):
fX(x) where P(a≤X≤b)=∫abfX(x)dx
Properties:
- fX(x)≥0 for all x
- ∫−∞∞fX(x)dx=1
Cumulative Distribution Function (CDF)
The CDF is defined for both discrete and continuous random variables:
FX(x)=P(X≤x)
For discrete: FX(x)=∑t≤xpX(t)
For continuous: FX(x)=∫−∞xfX(t)dt
In summary, random variables are functions that map outcomes of random processes(sample space) to real numbers, allowing us to analyze and quantify the behavior of random phenomena.
A more rigorous definition is possible by introducing measurement and probability space, you may access here optionally: Random Variable - stackexchange.
For more details on expectation and variance calculations, see Expectation and Variance.
Some Examples
Below are some examples of random variables in different contexts: discrete, continuous, and mixed.
Discrete Random Variable
Consider a simple example of rolling a fair six-sided die. The sample space Ω consists of the outcomes {1,2,3,4,5,6}. We can define a random variable X that maps each outcome to its value. For example, if we roll a die and get a 3, then X(ω)=3. The probability distribution of this random variable is uniform, meaning each outcome has an equal probability of 61.
PMF: pX(x)=61 for x∈{1,2,3,4,5,6}
For detailed calculations of expected value and variance, see Expectation and Variance.
Continuous Random Variable
Consider a continuous random variable that represents the amount of rainfall in a city over a month. The sample space Ω could be the set of all non-negative real numbers, representing the amount of rainfall in millimeters. We can define a random variable Y that maps each outcome to the amount of rainfall. For example, if we measure 50 mm of rainfall in a month, then Y(ω)=50. The probability distribution of this random variable could be modeled using a normal distribution, where the mean represents the average rainfall and the standard deviation represents the variability in rainfall.
Suppose the rainfall follows a normal distribution with mean μ=100 mm and standard deviation σ=30 mm. The PDF is:
fY(y)=302π1e−2⋅302(y−100)2
Probability of specific ranges:
- P(70≤Y≤130)=P(μ−σ≤Y≤μ+σ)≈0.6827 (68.27%)
- P(40≤Y≤160)=P(μ−2σ≤Y≤μ+2σ)≈0.9545 (95.45%)
CDF: FY(y)=∫0yfY(t)dt (truncated normal since rainfall ≥ 0)
Mixed Random Variable
Mixed random variables are those that can take on both discrete and continuous values. For example, consider a random variable that represents the number of customers arriving at a store in a day, where the number of customers can be any non-negative integer (discrete) and the time of arrival can be any real number (continuous).
Consider a random variable Z that represents the number of customers arriving at a store in a day. The sample space Ω could be the set of all non-negative integers for the number of customers and the set of all non-negative real numbers for the time of arrival. We can define a random variable Z that maps each outcome to the number of customers and their arrival time. For example, if 5 customers arrive at the store at different times throughout the day, we can represent this as Z(ω)=(5,t1,t2,t3,t4,t5), where ti represents the time of arrival of each customer. The probability distribution of this random variable could be a combination of a discrete distribution for the number of customers and a continuous distribution for the arrival times.
Comparison: Discrete vs Continuous Random Variables
| Aspect | Discrete Random Variables | Continuous Random Variables |
|---|---|---|
| Values | Countable (finite or infinite) | Uncountable (interval) |
| Probability Function | PMF: pX(x)=P(X=x) | PDF: fX(x) where P(a≤X≤b)=∫abfX(x)dx |
| Individual Points | P(X=x)>0 for specific x | P(X=x)=0 for any specific x |
| CDF | Step function | Continuous function |
| Examples | Coin flips, dice rolls, counts | Time, distance, temperature |
| Expected Value | E[X]=∑x⋅pX(x) | E[X]=∫x⋅fX(x)dx |
For information about covariance and correlation between random variables, see Expectation and Variance.
Joint Random Variables
When working with multiple random variables simultaneously, we need to understand their joint behavior and relationships.
Joint random variables describe the behavior of two or more random variables defined on the same probability space. For two random variables X and Y, their joint distribution specifies the probability of X taking value x and Y taking value y simultaneously.
Joint Probability Functions
For discrete random variables, we use the Joint Probability Mass Function:
pX,Y(x,y)=P(X=x,Y=y)
Properties:
- pX,Y(x,y)≥0 for all x,y
- ∑x∑ypX,Y(x,y)=1
For continuous random variables, we use the Joint Probability Density Function:
fX,Y(x,y) where P(a≤X≤b,c≤Y≤d)=∫ab∫cdfX,Y(x,y)dydx
Properties:
- fX,Y(x,y)≥0 for all x,y
- ∬R2fX,Y(x,y)dxdy=1
Marginal Distributions
The marginal distribution of one variable can be obtained from the joint distribution:
For discrete:
- pX(x)=∑ypX,Y(x,y)
- pY(y)=∑xpX,Y(x,y)
For continuous:
- fX(x)=∫−∞∞fX,Y(x,y)dy
- fY(y)=∫−∞∞fX,Y(x,y)dx
Independence
Random variables X and Y are independent if:
pX,Y(x,y)=pX(x)⋅pY(y) (discrete)
fX,Y(x,y)=fX(x)⋅fY(y) (continuous)
This means the joint distribution factors into the product of marginal distributions.
Consider rolling two fair six-sided dice. Let X be the outcome of the first die and Y be the outcome of the second die.
Joint PMF: pX,Y(x,y)=361 for x,y∈{1,2,3,4,5,6}
Marginal PMFs:
- pX(x)=∑y=16pX,Y(x,y)=61
- pY(y)=∑x=16pX,Y(x,y)=61
Since pX,Y(x,y)=pX(x)⋅pY(y), the dice rolls are independent.
Consider the relationship between height H and weight W of adults. These are typically not independent.
The joint PDF fH,W(h,w) describes how height and weight are distributed together in the population.
- The marginal density fH(h)=∫0∞fH,W(h,w)dw gives the distribution of heights regardless of weight
- The marginal density fW(w)=∫0∞fH,W(h,w)dh gives the distribution of weights regardless of height
Since height and weight are correlated, fH,W(h,w)=fH(h)⋅fW(w).
For more details on computing expectations with joint random variables, see Expectation and Variance.