CS452 - Real-Time Programming - Spring 2015
Lecture 29 - Pathologies III.
Public Service Annoucements
-
Train Control II demo on Wednesday, 11 July.
-
Lecture menu:
-
Turning on the power: Monday/Tuesday next week.
-
The exam has three start times.
-
20.30, August 4
-
09.30, August 5
-
20.30, August 5
The end times are 26.5 hours after the start time.
Answers to questions asked from 20.30, 4 August to 22.00,
4 August will be answered on the newsgroup, whether they
arrive by e-mail or on the newsgroup.
Pathologies
1. Deadlock
One or more tasks will never run again.
2. Livelock (Deadly Embrace)
Definition
Two or more tasks are READY. For each task, the state of other
tasks prevents progress being made regardless of which task is
ACTIVE.
3. Critical Races
Symptoms of a critical race
-
As long as the running time of the program is short everything
is fine. When the running time is long the program malfunctions,
and the time to malfunction is random.
-
The program functions perfectly with the caches turned off/on,
and crashes when they are turned on/off. Debugging I/O is
another source of such behaviour. (If you are using busy-wait
i/o please remember that there are incompatilities between
polled and interrupted i/o.)
-
The program fails after a minor change in priorities.
-
The program works well with one set of trains, fails miserably
with a different set.
In all these cases changing the order of execution either
creates or reveals a bug. It is usually straightforward to remove
the bug; what's hard is finding it.
A common bug that shares some symptoms is something uninitialed.
You can have success run after run, and then suddenly it doesn't
run: the program runs when the group before you has left the thing
in an initialized state, and doesn't run when it hasn't. I do not
consider this to be a critical race because there is no change
in execution order.
Examples
-
Sensitive areas in the code: usually two instructions that should be
atomic. Most should be fixed by reprogramming.
-
Execution order that is assumed to be invariant, but isn't, such as
querying the name server without thinking that the task about which you
are querying might not yet have registered.
-
Order of actions on the track, that change as the performance of trains
changes. One example would be switching a turn-out just before or just
after the safety threshold for an oncoming train. This example
demonstrates how touchy timing can be in a real-time system. Another might be
assuming that one train's sensor report will come before or after, but
not simultaneous to another train's report.
-
Changing task priorities to improve performanance.
Solutions
- Explicit synchronization
- but you then have to know the orders in which things are permitted
to occur
- e.g. Try listing all the orders in which events can occur in your
system
- and then notice that just arriving in the same order is often
not enough
-
Inevitably, almost all explicit synchronization that's put
into a program is unnecessary, and harms the program rather
than helping it.
-
Gating is a technique of global synchronization
- which can be provided by a detective/coordinator
4. Performance
Changes in performance of one task with respect to another often
give rise to critical races
The hardest problem to solve
- You just don't know what is possible
- Ask a question like:
- Is my kernel code at the limit of what is possible in terms of
performance?
- We can compare the performance on message passing, etc., because
two kernels are pretty much the same.
- Compare a lot of kernels and you should be able to find a lower
limit
- Can't do the same thing for train applications
Symptoms of performance problems
In general it's hard to figure out when a problem is caused by
bad performance and when it's possible to remedy performance.
(Sometimes there's no solution but buying a more capable processor
or more memory.) Here are some symptoms I have seen more than
once.
-
Percentage of time used by the idle task falling significantly below
90%.
-
Buffers filling up. They are almost always empty.
-
Negative, zero or close to zero delay arguments.
-
Unintended close calls on the track.
-
Stale sensor data.
-
Low priority tasks falling behind.
Most often performance problems build gradually during execution.
The symptoms above often indicate that the system will fail in
the immediate future.
Priority
The hardest thing to get right
-
NP-hard for the human brain
-
Practical method starts with all priorities the samfor the same type of
task, then adjusts.
-
symptoms of good priority assignment
-
The higher the priority, the more likely the ready queue is to be
empty.
-
The shorter the run time the higher the priority.
Problems with priority
- Priority inversion
- One resource, many clients
- Tasks try to do too much
Congestion
- Too many tasks
- blocked tasks don't count,
- lowest priority tasks almost don't count
Layered abstraction are costly
e.g. Notifier -> SerialServer -> InputAccumulater -> Parser ->
TrackServer
Output
-
Too much terminal output interferes with train controller
communication
- Don't redraw the entire screen
-
Requests to poll the sensors get backed up in the serial server,
or whoever provides output buffering.
Hardware
- Turn on optimization, but be careful
- There are places where you have done register allocation by
hand
- Turn on caches
Size & align calibration tables by size & alignment of cache
lines
I think that this is stretching it.
Return to: