# Lecture 20 - Random Variables

The mathematical foundation for random number is the random variable, which is a variable that has different values on different occasions.

1. A probability distribution determines what values a random variable gets, and how often
2. Discrete versus continuous random variables
3. We usually assume that the random variables we want are independent and identically distributed (iid or IID).
• Why is this assumption reasonable?
• Think about where we get the distributions from.
4. Most random variables we use have underlying distributions that are non-negative.
• Why?
5. Random variables are normally written as capitals.

#### Random variable concepts

1. Cumulative distribution function (cdf or CDF):
• F(x) = P(X <= x)
• F(0) = 0
• F(infinity) = 1
2. Probability density function (pdf or PDF):
• Continuous distributions only
• F(x) = \int_0^x f(x') dx'
• Mean: \mu = E(X) = \int_0^\infinity x'f(x')dx'
• Variance: \sigma^2 = E((X-E(X))^2) = \int_0^\infinity (x'-E(X))^2 f(x')dx'
3. Probability mass function (pmf or PMF):
• Discrete distributions only: x takes a value fro a countable set {x_1, x_2, x_3, ...} where ... may or may not terminate
• P(X = x_i) = p_i
• \sum_i p_i = 1
• F(x) = \sum_(x_i<=x) p_i
• Mean: E(X) = \sum_i x_i p_i
• Variance: \sum_i (x_i - E(X) )^2 p_i

In discrete event simulation we model quantities like interarrival time or service time as IID random variables.

As a result we need to generate random variables from random number generators.

• Random number generators normally produce random floating point numbers uniformally distributed between 0 and 1. (Technically, in the range [0.0..1.0).)
• We need to know how to transform this output into the distribution we want.

#### Random variables with arbitrary distributions

A random variable is uniquely defined by its CDF, F(x).

• 0 <= F(x) <= 1
• Suppose we want to express this random variable as a transformation Y = g(X) of a uniformly distributed random variable.
• P(Y < y) = F_y(y) = P(X < g^-1(y) ) = F_x( g^-1(y) ).
• Choose g(x) = F_x(x). Then P(Y < y) = 1/y, or f_y(y) = 1/(length of interval). Y is uniformly distributed.

## The most important distribution for performance evaluation - The exponential distribution

### The Poisson process

Examples

• Judgements given in criminal and civil courts
• Number of soldiers killed by horse kicks in the Prussian cavalry.

#### Example - Radioactive Decay

Ingredients

1. A huge number of radioactive atoms, independent of one another
2. Each has the same probability of decay

Bring in a geiger counter

• Measure for a fixed time
• K: Probability of hearing k clicks is P(k) = (1/k!) (pt)^k exp( -pt )
• The number observed in one interval is independent of the number observed in another interval
• This is called a Poisson distribution
• Measure the time between clicks - the inter-arrival time
• T: Probability density (experimentally realised as a histogram) of the time between clicks is f(t) = p exp(-pt)
• Each measured time is independent of every other time
• How are these two results related?
• P( next click between t and t+dt seconds ) is
• P( zero clicks between 0 and t ) = exp( -pt ) AND
• P( one click between t and t+dt ) = p dt exp( -p dt )
• Amount of CDF between t and t+dt is p exp( -p (t+dt) ) dt
• Therefore F(t+dt) - F(t) = p exp( -p (t+dt) ) dt
• f(t) = lim (F(t+dt) - F(t)) / dt = p exp( -pt )

#### Definitions

• radioactive atom = user
• geiger counter = mouse
• the above = open system

#### Conclusion

For an open system

Event* new-arrival-event( time ){
Event* evt = malloc( sizeof( Event ) );
evt->type = ARRIVAL;
evt-time = time + rand-exp( <arrival rate> );
return evt;
}

We have to determine how to create a random number with an exponential distribution, and it had better be efficient because we do it a lot.

#### Closing Message

Despite the somewhat facetious tone of the above notes there is actually quite a lot of evidence that Poisson/Exponential distributions occur frequently in the sort of thing likely to provide requests to systems we evaluate.

• The number of phone calls at a call centre per minute.
• Under an assumption of homogeneity, the number of times a web server is accessed per minute.
• The number of mutations in a given stretch of DNA after a certain amount of radiation.
• The arrival of "customers" in a queue.
• The number of telephone calls arriving at a switchboard, or at an automatic phone-switching system.
• The long-term behavior of the number of web page requests arriving at a server, except for unusual circumstances such as coordinated denial of service attacks or flash crowds. Such a model assumes homogeneity as well as weak stationarity.

## Other Distributions

The exponential distribution pretty well takes care of arrival events in open systems. What about

• arrival events in closed systems, aka think time?
• service times?
• others?

These are normally modelled based on distributions that are abstractions of behaviour observed in logs or traces. Here are a few that turn up from time to time.

### Bernoulli & BinomialDistributions

Systems that are easily categorized into two distinct homogeneous classes.

#### Examples

Think times

• Easy versus hard problems
• Need to access manual versus no need to access manual

Service times

• Slow CPU versus fast CPU
• Local versus remote

#### Bernoulli

Coin-flipping is the natural analogue, but it covers any binary choice made at (biased!) random.

• P(heads) = p
• P(tails) = 1-p

How do you sample a Bernoulli distribution in practice

• Get a random number, u, uniformly distributed on [0,L)
• If u < Lp then heads, else tails.

#### Binomial

Make N binary choices at random, all identical.

• P(k heads in N tosses) = (N choose k) p^k (1 - p)^(N-k)

How do you sample a binomial distribution in practice

1. Small N: sample a Bernoulli N times,
• linear in N for each sample
2. Medium N: divide the range [0,L) into N+1 parts proportional to (N choose k) and do binary search with u
• linear in N to set up
• logarithmic in N for each sample
3. Large N: converge to another distribution
• Poisson if Np is constant as N -> infinity
• Gaussian if p is constant as N -> infinity