CS452 - Real-Time Programming - Winter 2017
Lecture 26 - Pathologies.
Public Service Annoucements
-
Train Control II demo on Thursday, 16 March.
-
tHE exam will start at 12.30, April 6, 2017 and finish at 15.00,
7 April 2017.
-
Trains lab open house: 18 March, 10.30 to about 14.00.
Operating reserved track
A few questions to think about
-
How much must be controlled to ensure that this constraint
is respected?
-
Try listing things that might go wrong?
-
Are you willing to trust the train driver code, especially
its real-time aspect? The problem occurs when the train is
too close to the switch when the command to throw it is given.
Some implementation like to have thw switch do the checking
rather than the train driver. Why?
Pathologies
As we go down this list both pathology detection and the length
of the edit-compile-test cycle grow without bound.
1. Deadlock
One or more tasks will never run again. For example
-
Task sends to itself (local: rest of system keeps running,
task itself will never run)
-
Every task does Receive( ) (global: nothing is running, except
possibly the idle task; all tasks are SEND_BL)
-
Cycle of tasks sending around the cycle. All tasks in the
cycle are RCV_BL. (local: other tasks keep running)
-
One train is on a siding trying to get out; another train is
on the switch at the end of the siding trying to get in.
(external: application okay but the world is in an unanticipated
configuration. "Unanticipated" means that no code was implemented
to deal with this case.)
Kernel can detect the first three; only the train application can
detect the fourth.
Potential deadlock can be detected at compile time
- cycle in the send graph of all sends that could happen
-
doesn't necessarily occur at run-time
-
that is, a cycle in the send graph is a necessary but not
sufficient condition for deadlock.
-
It's worse,
-
It doesn't happen when tests are short
-
It appears near the end when tests run for longer
-
Changes in a critical race can make a potential deadlock
reveal itself.
Solutions
- Gating
-
The most common example of cycles is conflict of communication
patterns between initiatization and running.
-
Gate the end of initialization, so that all tasks complete
initialization at the same logical time.
-
Use an explicit coding model. Define four types of task
-
Administator (A), including servers of all kinds: only receives
-
Worker (W), including notifiers: only sends
-
Client (C): only sends
-
Notifier (N): only sends to a hard-coded target.
-
Two A tasks cannot communicate directly; two W/C tasks cannot communicate directly.
-
For W/C/N tasks
Send
appears in two flavours
-
C tasks
FOREVER {
Send( A, request, result )
...
}
-
W tasks
FOREVER {
Send( A, result, request )
...
}
-
N tasks
FOREVER {
Send( A, result, request )
result.data = AwaitEvent( request.event )
...
}
-
The Receives corresponding to W, C & N tasks are
normally the same.
-
N is effectively a W task
-
The important difference is that while W & C tasks
are lower priority than the A task, N tasks are higher
priority.
-
C, W & N requests and results must have compatible data
types.
-
The request might be effectively a union
- The payload is interpreted differently at run-time,
- using different cases of the switch on message
type.
-
A courier is W-type to one A-task, C-type to the other
FOREVER {
Send( A1, request, result )
Send( A2, result, request )
}
-
Occasionally, but not often, two A-tasks, are
synchronized in a way that makes it necessary to
communicate in two directions through a single courrier,
but this a fragile programming construct.
- Using two couriers is more robust.
-
Do whatever seems right.
2. Livelock (Deadly Embrace)
Definition
Two or more tasks are READY. For each task, the state of other
tasks prevents progress being made regardless of which task is
ACTIVE.
A higher level of coordination is required.
There are two types of livelock
- Ones that are the result of bad coding
- Ones that are inherent in the application definition
- Detect livelock and work around it.
Livelock usually occurs in the context of resource contention
Livelock that's Really Deadlock
- client1 needs resource1 & resource2;
- obtains resource1 from proprietor1;
- asks proprietor2 for resource2
- client2 needs resource1 & resource2;
- obtains resource2 from proprietor2;
- asks proprietor1 for resource1
-
possible code: server queues clients
- Client 1
Send( prop1, getres1, ... );
Send( prop2, getres2, ... );
// Use the resources and release them
-
Client 2
Send( prop2, getres2, ... );
Send( prop1, getres1, ... );
// Use the resources and release them
-
Proprietor
FOREVER {
Receive( &clientTid, req, ... );
switch ( req-type ) {
case REQUEST:
if( available ) {
Reply( clientTid, use-it, ... );
available = false;
}
else enqueue( clientTid );
case RELEASE:
available = true;
Reply( clientTid, "thanks", ... );
if( !empty( Q ) ) {
available = false;
Reply( dequeue( ), use-it, ... );
}
}
}
-
state:
-
client1, client2: REPLY-BLOCKED - can't release resources
-
proprietor1, proprietor2: SEND-BLOCKED - waiting for release
-
this is a true deadlock -- none of the four tasks will ever
run again -- even though there are no cycles in the call
graph.
-
The dependencies lie elsewhere. Where?
Solutions
-
Make a single compound resourse, BUT
-
all clients may not need both
-
Some resources simply cannot be compounded. The track is a
good example.
-
Impose a global order on resource requests that all clients
must follow.
-
unsafe against malicious or incompetent programmers
-
some resources don't admit strong enough ordering, e.g.
pieces of track in the train set
-
Create a mega-server that handles all resource requests
-
clients request all at once, mega-server provides an optimal
solution to resource use in the presence of hundreds of
deadlines.
-
client might not know that A is needed until processing with B is
well-advanced
Real Livelock
Proprietor1 & proprietor2 fail the requests
- Proprietor
FOREVER {
Receive( &clientTid, req, ... );
switch ( req-type ) {
case REQUEST:
if( available ) {
Reply( clientTid, use-it, ... );
available = false;
}
else Reply( clientTid, "sorry", ...);
case RELEASE:
available = true;
Reply( clientTid, "thanks", ... );
}
}
-
Polling is the most likely result. Typical client code.
while ( Send( prop1, getr1, ... ) != GotIt ) ;
while ( Send( prop2, getr2, ... ) != GotIt ) ;
// Use the resources
-
And the problem is that this code usually works as long as the
code remains under development. But as deployment times lengthen
the bug appears occasionally.
Livelock that's Really a Critical Race
We could try to make the clients a little more considerate
while ( no resources ) {
Send( prop1, getres1, result );
while ( result == "sorry" ) {
Delay( ... );
Send( prop1, getres1, result );
}
Send( prop2, getres2, result );
while ( result == "sorry" ) {
Send( prop1, relres1, ... );
Delay( ... );
}
}
or even more considerate
while ( true ) {
Send( prop1, getres1, result );
while ( result == "sorry" ) {
Delay( ... );
Send( prop1, getres1, result );
}
Send( prop2, getres2, result );
if ( result == "sorry" ) {
Send( prop1, relres1, ... );
} else {
break;
}
Delay( ... );
}
}
This we call a critical race because avoiding what is effectively
an infinite loop depends on the timing of execution.
How quickly code like this escapes from the critical race depends
on the argument you give to Delay(...)
.
If it's the same constant, which is common because both clients
are running the same code, the delay can persist for a long
time.
If it's random, and long enough to re-order the execution
order of the code, then the deadlock will not long persist.
Inherent Livelock
Remember the example where two trains come face to face, each waiting for
the other to move. They will wait facing each other until the demo is over,
probably polling.
What's hard about solving this problem?
- Neither driver knows what the other driver is trying to do.
In real life,
- the drivers would communicate, but
- in your software that's neither easy
- How many different `conversations' might need to be available?
nor desirable
- What is the effect on other trains of the two drivers' special
arrangement?
What's most easy for you to do is to programme each driver with
- detection, e.g.,
- Delay a random time
- Request again
- If turned down, work around
- work around, e.g.,
- Recommence working on goal as though track is blocked.
3. Critical Races
Example
- Two tasks, A & B, at the same priority
- A is doing a lot of debugging IO
- B always reserves a section of track before A, and all is fine.
- Debugging IO is removed
- A reserves the section before B can get it, and execution
collapses.
- Lower priority of A to the same level as C.
- Now C executes faster and gets a resource before D .
- You shuffle priorities forever, eventually reverting, to put back in
the debugging IO.
Definition
The order in which computation is done is an important factor in
determining whether or not it is successful.
Critical races, like Livelock can be
- internal to the program, like the one above, or
- external to the program but inherent in the application domain
Symptoms
- Small changes in priorities change execution unpredictably, and
drastically.
- Debugging output changes execution drastically.
- Changes in train speeds change execution drastically.
- Example from several terms ago
`Drastically' usually means chaos in both senses of the term
- Sense one: a small change in the initial conditions produces an
exponentially growing divergence in the execution.
- Sense two: exercise for the reader.
Solutions
- Explicit synchronization
- but you then have to know the orders in which things are permitted
to occur
- e.g. Try listing all the orders in which events can occur in your
system
- and then notice that just arriving in the same order is often
not enough
- Gating is a technique of global synchronization
- which can be provided by a detective/coordinator
4. Performance
Changes in performance of one task with respect to another often give rise
to critical races
The hardest problem to solve
- You just don't know what is possible
- Ask a question like:
- Is my kernel code at the limit of what is possible in terms of
performance?
- We can compare the performance on message passing, etc., because
two kernels are pretty much the same.
- Compare a lot of kernels and you should be able to find a lower
limit
- Can't do the same thing for train applications
In practice, how do you know you have performance problems?
Problems I have seen:
-
Turn-outs switch too slowly.
-
Sensor data is stale.
Priority
The hardest thing to get right
- NP-hard for the human brain
- Practical method starts with all priorities the same, then adjusts
- symptoms of good priority assignment
- The higher the priority, the more likely the ready queue is to be
empty
- The shorter the run time in practice the higher the priority
Problems with priority
- Priority inversion
- One resource, many clients
- Tasks try to do too much
Congestion
- Too many tasks
- blocked tasks don't count,
- lowest priority tasks almost don't count
Layered abstraction are costly
e.g. Notifier -> SerialServer -> InputAccumulater -> Parser ->
TrackServer
Output
- Too much terminal output interferes with train controller communication
- Don't redraw the entire screen
- Requests to poll the sensors get backed up in the serial server, or
whoever provides output buffering.
Hardware
- Turn on optimization, but be careful
- There are places where you have done register allocation by
hand
- Turn on caches
Size & align calibration tables by size & alignment of cache
lines
I think that this is stretching it.
Return to: