CS452 - Real-Time Programming - Winter 2013
Lecture 28 - Pathologies
Public Service Annoucements
- When you will give the second demo
- No class on Friday, 29 March. Replaced by a class on Monday 8
April.
- How would you like to handle the demos?
- One day or two?
- If one, then 9 April.
- If two, then 8,9 April or 9,10 April
- Any questions about calibration or reservations?
Projects
- Lots of projects do following.
- It has only been done once before, and only got following
implemented successfully.
- Work out you following algorithm, implement and test it on a single
path. The tests should include
- going through switches in all configurations
- using rough parts of the track
- initiating following
- Lots of projects have AIs.
- Keep them stupid. e.g. If the AI is chased randomize oncoming
switch positions.
- An advanced AI would throw the switch back immediately after
passing through.
- Several projects have competing trains.
- You need well-defined rules about things like giving up owned
track.
- How much the trains know about one another is a critical decision.
Make it well.
- All projects are ambitious.
- That's good, but
- such projects require a carefully staged development plan,
- such projects benefit from a plan B for as many things as
possible
Pathologies
As we go down this list both pathology detection and the length of the
edit-compile-test cycle grow without bound.
1. Deadlock
2. Livelock (Deadly Embrace)
Religious battles about the correct name.
Definition
Two or more tasks are READY. For each task, the state of the other tasks
prevents progress being made regardless of which task is ACTIVE.
A higher level of coordination is required.
Two types of livelock exist
- Ones that are the result of bad coding
- Ones that are inherent in the application definition
- Detect livelock and work around it.
Looking for solutions we prefer ones that avoid a central planner. Why?
- In the twentieth century there were a collection of political systems
relying on central planners
Livelock usually occurs in the context of resource contention
Livelock that's Really Deadlock
- client1 needs resource1 & resource2;
- obtains resource1 from proprietor1;
- asks proprietor2 for resource2
- client2 needs resource2 & resource1;
- obtains resource2 from proprietor2;
- asks proprietor1 for resource1
- possible code
- Client 1
Send( prop1, getres1, ... );
Send( prop2, getres2, ... );
// Use the resources and release them
- Client 2
Send( prop2, getres2, ... );
Send( prop1, getres1, ... );
// Use the resources and release them
- Proprietor
FOREVER {
Receive( &clientTid, req, ... );
switch ( req-type ) {
case REQUEST:
if( available ) {
Reply( clientTid, use-it, ... );
available = false;
}
else enqueue( clientTid );
case RELEASE:
available = true;
Reply( clientTid, "thanks", ... );
if( !empty( Q ) ) {
available = false;
Reply( dequeue( ), use-it, ... );
}
}
}
- state:
- client1, client2: REPLY-BLOCKED - can't release resources
- proprietor1, proprietor2: SEND-BLOCKED - waiting for release
- this is a true deadlock -- neither of the clients will ever run
again -- even though there are no cycles in the call graph.
- The dependencies lie elsewhere. Where?
- (You can find on the internet religious arguments about terminology
for this case just as intense as anything you will ever see in vi vs
emacs or Apple vs Microsoft.)
Solutions
- Make a single compound resourse, BUT
- all clients may not need both
- some resources simply cannot be compounded
- Impose a global order on resource requests that all clients must
follow.
- unsafe against malicious or incompetent programmers
- some resources don't admit strong enough ordering, e.g. pieces of
track in the train set
- Create a mega-server that handles all resource requests
- server keeps track of all requests, finds any collisions and
resolves them.
- client might not know that A is needed until processing with B is
well-advanced
Real Livelock
Proprietor1 & proprietor2 fail the requests
- Proprietor
FOREVER {
Receive( &clientTid, req, ... );
switch ( req-type ) {
case REQUEST:
if( available ) {
Reply( clientTid, use-it, ... );
available = false;
}
else Reply( clientTid, "sorry", ...);
case RELEASE:
available = true;
Reply( clientTid, "thanks", ... );
}
}
- Polling is the most likely result. Typical client code.
while ( Send( prop1, getr1, ... ) != GotIt ) ;
while ( Send( prop2, getr2, ... ) != GotIt ) ;
// Use the resources
- And the problem is that this code almost always works as long as the
tests are relatively short
Livelock that's Really a Critical Race
We could try to make the clients a little more considerate
While ( no resources ) {
Send( prop1, getres1, result );
while ( result == "sorry" ) {
if ( result == "sorry" ) {
Delay( ... );
Send( prop1, getres1, result );
}
Send( prop2, getres2, result );
if ( result == "sorry" ) {
Send( prop1, relres1, ... );
Delay( ... );
} else {
break;
}
}
Inherent Livelock
Remember the example where two trains come face to face, each waiting for
the other to move. They will wait facing each other until the demo is over,
probably polling.
What's hard about solving this problem?
- Neither driver knows what the other driver is trying to do.
In real life,
- the drivers would communicate, but
- in your software that's neither easy
- How many different `conversations' might need to be available?
nor desirable
- What is the effect on other trains of the two drivers' special
arrangement?
What's most easy for you to do is to programme each driver with
- detection, e.g.,
- Delay a random time
- Request again
- If turned down, work around
- work around, e.g.,
- Recommence working on goal as though track is blocked.
3. Critical Races
Example
- Two tasks, A & B, at the same priority
- A is doing a lot of debugging IO
- B always reserves a section of track before A, and all is fine.
- Debugging IO is removed
- A reserves the section before B can get it, and execution
collapses.
- Lower priority of A to the same level as C.
- Now C executes faster and gets a resource before D .
- You shuffle priorities forever, eventually reverting, to put back in
the debugging IO.
Definition
The order in which computation is done is an important factor in
determining whether or not it is successful.
Critical races, like Livelock can be
- internal to the program, like the one above, or
- external to the program but inherent in the application domain
Symptoms
- Small changes in priorities change execution unpredictably, and
drastically.
- Debugging output changes execution drastically.
- Changes in train speeds change execution drastically.
- Using different trains, track getting dirty, etc
- Example from several terms ago
`Drastically' usually means chaos in both senses of the term
- Sense one: a small change in the initial conditions produces an
exponentially growing divergence in the execution.
- Sense two: exercise for the reader.
Solutions
- Design software to require as little synchronization as possible.
- scenario-based design, which we all do, makes this difficult
- Explicit synchronization
- but you then have to know the orders in which things are permitted
to occur
- e.g. Try listing all the orders in which events can occur in your
system
- and then notice that just arriving in the same order is often
not enough
- because you need enough advance warning to get the switch
thrown
- The more synchronization a program has the more brittle it is.
- Gating is a technique of global synchronization
- which can be provided by a detective/coordinator
- coordinator is a task that does nothing but synchronize other
tasks
4. Performance
Changes in performance of one task with respect to another often give rise
to critical races
The hardest problem to solve
- You just don't know what is possible
- Ask a question like:
- Is my kernel code at the limit of what is possible in terms of
performance?
- We can compare the performance on message passing, etc., because
two kernels are pretty much the same.
- Compare a lot of kernels and you should be able to find a lower
limit
- Can't do the same thing for train applications
In practice, how do you know you have performance problems?
- Gold standard is the amount of time the idle task is running
- UART errors
- dropped characters
- error bits set
Train errors are usually flow control; terminal errors are usually
performance.
- Timer errors
- less likely than UART errors
Priority
The hardest thing to get right
- NP-hard for the human brain
- Practical method starts with all priorities the same, then adjusts
- symptoms of good priority assignment
- The higher the priority, the more likely the ready queue is to be
empty
- The shorter the run time in practice the higher the priority
Problems with priority
- Priority inversion
- One resource, many clients
- Tasks try to do too much
Congestion
- Too many tasks
- blocked tasks don't count,
- lowest priority tasks almost don't count
Layered abstraction are costly
e.g. Notifier -> SerialServer -> InputAccumulater -> Parser ->
TrackServer
Output
- Too much terminal output interferes with train controller communication
- Don't redraw the entire screen
- Requests to poll the sensors get backed up in the serial server, or
whoever provides output buffering.
Hardware
- Turn on optimization, but be careful
- There are places where you have done register allocation by
hand
- Turn on caches
Size & align calibration tables by size & alignment of cache
lines
- linker command script
- I think that this is stretching it.
Return to: