CS457 - System Performance Evaluation - Winter 2010
Public Service Announcements
- Graduate school
- Added books to References
Lecture 10 - Exploratory Data Analysis
Monitors
The collection of probes inserted to generate traces.
- Probes can be software (usual) or hardware (if necessary)
- Example of software, profile
- Example of external, line analyser
- Probes can be internal, external
- Example of internal: mail log
- Example of external: snoop
- Probes can be
- Sample driven: every so often the probe collects part of the state
of the system
- The picture is an evolving system.
- Event driven: every time a particular event occurs the probe
collects features of the system.
- The picture is a sequence of events
- Often include capability to analyse and display results
Measurement
What other reasons might you have for wanting to measure performance?
- Identify heavily used components
- hardware or software modules
- Locate bottlenecks
- Characterize workloads
- Validation of simulations or models
What do we try to measure?
- Arrival and departure times of requests, to get
- response time
- number of requests in the system
- Processor activity, such as
- number of processes in system
- system versus application time
- Other resource activity, particularly NICs
- In modern systems communication is often a bottleneck
- Failures
Levels of Measurement
- Application code
- Profiling tools
- Operating system
- Kernel
- Hardware
The Big Question
How much does the introduction of monitoring software influence the
performance of the application.
- Event-driven monitors
- Sampling monitors
Logs
Traces
What I have called a log is usually called a trace, which is
- a time-ordered sequence of events from a real system, as opposed to a
simulation
- useful for assessing performance and generating workloads
The nature of logs
- big: 10 Mbytes is a small log; 1 Tbyte is a big log
- cryptic
- always have a time stamp
Tools for looking at logs
Exploratory Data Analysis (EDA)
Data is most conveniently handled in the form of records
independent variable 1 (IV1), IV2, IV3, ..., dependent (measured) variable 1 (DV1), DV2, DV3, ...
Typical independent variables
- sequence number
- time
- request type
- request arguments
- source of request
Typical dependent variables
- response time
- requests in the system
- fail/succeed
Rule 1
Use a subset of the data if there will not be more available.
- Computers produce enough log data to overwhelm most EDA programs.
Rule 2
User your eyes
- Your visual system is a better pattern finder than any computer
Rule 3
Split the data
- You see an indistinct pattern.
- Select the data in which the pattern exists: the pattern is now
distinct
- Ask how the selected data differs from the unselected data
Potentially Useful Displays of Data
- Bar charts & box plots
- discrete independent variables
- Box plots are usually more informative
- Histograms & scatter plots
- Histograms better for very dense data
- Scatter plots better with highly selected data
- Three-dimensional plots
- Summaries of partitioned data
- Many statistics to select:
- mean, median, percentile points,
- measures of variability, such as
- standard deviation, range
Important: Look for data that doesn't belong, the look
back at it and its environment in the log.
Return to: