CS457 - System Performance Evaluation - Winter 2010
Public Service Announcements
Lecture 20 - Random Variables
The mathematical foundation for random number is the random variable,
which is a variable that has different values on different occasions.
- A probability distribution determines what values a random variable
gets, and how often
- Discrete versus continuous random variables
- We usually assume that the random variables we want are independent and
identically distributed (iid or IID).
- Why is this assumption reasonable?
- Think about where we get the distributions from.
- Most random variables we use have underlying distributions that are
non-negative.
- Random variables are normally written as capitals.
Random variable concepts
- Cumulative distribution function (cdf or CDF):
- F(x) = P(X <=
x)
- F(0) = 0
- F(infinity) = 1
- Probability density function (pdf or PDF):
- Continuous distributions only
- F(x) = \int_0^x
f(x') dx'
- Mean: \mu = E(X) = \int_0^\infinity
x'f(x')dx'
- Variance: \sigma^2 =
E((X-E(X))^2) = \int_0^\infinity
(x'-E(X))^2
f(x')dx'
- Probability mass function (pmf or PMF):
- Discrete distributions only: x takes a value fro a countable set
{x_1, x_2, x_3, ...} where ... may or may
not terminate
- P(X = x_i) =
p_i
- \sum_i p_i = 1
- F(x) =
\sum_(x_i<=x) p_i
- Mean: E(X) = \sum_i
x_i p_i
- Variance: \sum_i (x_i -
E(X) )^2 p_i
In discrete event simulation we model quantities like interarrival time or
service time as IID random variables.
As a result we need to generate random variables from random number
generators.
- Random number generators normally produce random floating point numbers
uniformally distributed between 0 and 1. (Technically, in the range
[0.0..1.0).)
- We need to know how to transform this output into the distribution we
want.
Random variables with arbitrary distributions
A random variable is uniquely defined by its CDF, F(x).
- 0 <= F(x) <= 1
- Suppose we want to express this random variable as a transformation Y =
g(X) of a uniformly distributed random variable.
- P(Y < y) = F_y(y) = P(X < g^-1(y) ) = F_x( g^-1(y) ).
- Choose g(x) = F_x(x). Then P(Y < y) = 1/y, or f_y(y) = 1/(length of
interval). Y is uniformly distributed.
The most important distribution for performance evaluation - The
exponential distribution
The Poisson process
Examples
- Judgements given in criminal and civil courts
- Number of soldiers killed by horse kicks in the Prussian cavalry.
Example - Radioactive Decay
Ingredients
- A huge number of radioactive atoms, independent of one another
- Each has the same probability of decay
Bring in a geiger counter
- Measure for a fixed time
- K: Probability of hearing k clicks is P(k) = (1/k!) (pt)^k exp( -pt
)
- The number observed in one interval is independent of the number
observed in another interval
- This is called a Poisson distribution
- Measure the time between clicks - the inter-arrival time
- T: Probability density (experimentally realised as a histogram) of
the time between clicks is f(t) = p exp(-pt)
- Each measured time is independent of every other time
- How are these two results related?
- P( next click between t and t+dt seconds ) is
- P( zero clicks between 0 and t ) = exp( -pt ) AND
- P( one click between t and t+dt ) = p dt exp( -p dt )
- Amount of CDF between t and t+dt is p exp( -p (t+dt) ) dt
- Therefore F(t+dt) - F(t) = p exp( -p (t+dt) ) dt
- f(t) = lim (F(t+dt) - F(t)) / dt = p exp( -pt )
Definitions
- radioactive atom = user
- geiger counter = mouse
- the above = open system
Conclusion
For an open system
Event* new-arrival-event( time ){
Event* evt = malloc( sizeof( Event ) );
evt->type = ARRIVAL;
evt-time = time + rand-exp( <arrival rate> );
return evt;
}
We have to determine how to create a random number with an exponential
distribution, and it had better be efficient because we do it a lot.
Closing Message
Despite the somewhat facetious tone of the above notes there is actually
quite a lot of evidence that Poisson/Exponential distributions occur
frequently in the sort of thing likely to provide requests to systems we
evaluate.
- The number of phone calls at a call centre per minute.
- Under an assumption of homogeneity, the number of times a web server is
accessed per minute.
- The number of mutations in a given stretch of DNA after a certain
amount of radiation.
- The arrival of "customers" in a queue.
- The number of telephone calls arriving at a switchboard, or at an
automatic phone-switching system.
- The long-term behavior of the number of web page requests arriving at a
server, except for unusual circumstances such as coordinated denial of
service attacks or flash crowds. Such a model assumes homogeneity as well
as weak stationarity.
Other Distributions
The exponential distribution pretty well takes care of arrival events in
open systems. What about
- arrival events in closed systems, aka think time?
- service times?
- others?
These are normally modelled based on distributions that are abstractions
of behaviour observed in logs or traces. Here are a few that turn up from
time to time.
Bernoulli & BinomialDistributions
Systems that are easily categorized into two distinct homogeneous
classes.
Examples
Think times
- Easy versus hard problems
- Need to access manual versus no need to access manual
Service times
- Slow CPU versus fast CPU
- Threaded versus unthreaded
- Local versus remote
Bernoulli
Coin-flipping is the natural analogue, but it covers any binary choice
made at (biased!) random.
- P(heads) = p
- P(tails) = 1-p
How do you sample a Bernoulli distribution in practice
- Get a random number, u, uniformly distributed on [0,L)
- If u < Lp then heads, else tails.
Binomial
Make N binary choices at random, all identical.
- P(k heads in N tosses) = (N choose k) p^k (1 - p)^(N-k)
How do you sample a binomial distribution in practice
- Small N: sample a Bernoulli N times,
- linear in N for each sample
- Medium N: divide the range [0,L) into N+1 parts proportional to (N
choose k) and do binary search with u
- linear in N to set up
- logarithmic in N for each sample
- Large N: converge to another distribution
- Poisson if Np is constant as N -> infinity
- Gaussian if p is constant as N -> infinity
Return to: