CS457 - System Performance Evaluation - Winter 2010
Public Service Announcement
- Assignment Mark Weighting : 15%, 25%, 20%, 40%
- Assignment 2 preview
Lecture 9 - Measuring Performance
Consider the following scenario
- The boss calls: The performance of our system is lousy.
- What do you respond?
- Now you know the details.
- Look at the logs
- What is a log?
- A list of events that have occurred in the system, and when
they occurred.
- System logs
- Application logs
- Look in the log for the for the symptoms your boss noticed.
- You should be able to find the events that triggered your
boss's anxiety
- Check to see if they are still occurring
- See what's other things are going on when they occur
- 2/3 of performance problems are understood at this point.
- For example, you notice that they occur every time a backup
starts and move the backup to a different time.
- This is just general purpose problem solving, a skill that you
learned on your mother's knee.
And if it isn't.
- Turn on one or more logs that were turned off
- If all the logs are running all the time the system is overwhelmed
with data
- Turning on extra logs slows down the system a little, but not much
- As long as you can do it when the system is live you're
okay.
- 2/3 of the remaining performance problems are understood when you
look at the extra logs.
And if it isn't.
- You now know whether the problem is where the boss thinks it is: i.e.,
inside a particular application.
- Turn on the monitoring (debugging) code in the application, and
look at the logs it produces
- Turning on monitoring code slows the application a little more, but
still not too much
- As long as you can turn it on without needing to restart the
application.
- If the application is well-designed 2/3 of the remaining problems
are understood at this point.
- All this time you have been examining the application while it is `in
service', because that's where the problem was noticed.
Now you are going to start doing things that you can't try while the
application is in service.
- Prepare a clean system with the same hardware resources, operating
system, and only the application.
- Use one of your logs to prepare a set of input requests.
- Feed the set of requests to the system using a load generator, and
look for the problem in the output logs
- What do you do if has disappeared?
- Add instrumentation to the application by altering its source code.
- If the problem is in the application this will allow you to see
it.
- But the problem may be in the interaction with the operating
system.
- Find a more-instrumented version of the operating system
- Run the application under it. Look at the logs.
- But the problem may be in the way hardware responds to operating
system requests.
- Find a hardware monitor and put it on the system in the critical place
- Run the application. Compare the logs of the hardware monitor to
those of the operating system
If you have identified the problem in steps 6-8 you will almost certainly
want to test performance using new synthetic workloads.
Once you think the problem is solved you work your way back up this lists
until you get to 1, which changes to `You call the boss.'
Traces
What I have called a log is usually called a trace, which is
- a time-ordered sequence of events from a real system, as opposed to a
simulation
- useful for assessing performance and generating workloads
Practical Issues
The nature of logs
- big: 10 Mbytes is a small log; 1 Tbyte is a big log
- cryptic
- always have a time stamp
Tools for looking at logs
- propriety system administration tools,
- which are usually system-specific
- which are expensive
- tr, grep, sed, cut, awk
- First use them interactively, then
- make shell scripts for reusing them: stick to sh
- some people like perl
usually still too big for a spreadsheet
- exploratory data analysis tools
- real data analysis tools
Monitors
The code inserted to generate traces.
- often include capability to analyse and display results
Measurement
What other reasons might you have for wanting to measure performance?
- Identify heavily used components
- hardware or software modules
- Locate bottlenecks
- Characterize workloads
- Validation of simulations or models
What do we try to measure?
- Arrival and departure times of requests
- Processor activity, which should be correlated with service times
- Other resource activity, particularly NICs
- Failures
Levels of Measurement
- Application code
- Profiling tools
- Operating system
- Kernel
- Hardware
The Big Question
How much does the introduction of monitoring software influence the
performance of the application.
- Event-driven monitors
- Sampling monitors
Return to: