CS789, Spring 2007
Lecture notes, Week 6
Synchronization is about time: when should something happen? Sometimes
real-time matters (double clicking); sometimes only temporal sequence matters
(typing a command to Unix). Synchronization is also about context: in what
context are the actions of the user interpreted by the computer (commands vs
input typed to vi), and vice versa (<P> in html source versus in
Why are context and time related like this? Because human ability to
handle different contexts simultaneously is limited. Often, when an interface
offers several contexts simultaneously the user operates within only one at a
As an example, I can imitate your facial expressions. How?
So, why can't two people having a conversation talk at once? It should be
- One mechanism watches you and analyses your expressions.
- Another mechanism formulates target expressions to assume, and releases
appropriate motor programs.
- Both mechanisms run simultaneously.
What do I have to do in the second case that I don't have to do in the first
one? (Hint. Most of us can read from a tele-prompter, but very few can read
and paraphrase at the same time. What does simultaneous translation tell us?)
- Each person hears two voices, but can use stream segregation to pay
attention to only one of them.
- Each person only has to control one mouth.
- A process interprets what is coming in and formulates things to say.
(This process runs coincidently with listening and interpreting when we
interchange conversational control.)
A simple example: the prompt
Most interfaces exchange control between the user and computer. How is the
exchange initiated and completed? The key concepts are the trigger and the
- Return/enter is the trigger, transferring control from user to
computer. Before the trigger
After the trigger
- what has been typed continues to be owned by the user, and
- it can still be editted by the user.
- Note the existence of a second level trigger within the
interaction. (Typing a character is revocable until a particular
instant when the switch contacts meet.)
- what has been typed is owned by the computer.
- At the prompt the computer relinquishes control, returning it to the
- Where do prompts and triggers exist outside computer interfaces?
- What is type-ahead? What makes echoing type-ahead problematic?
What makes a good trigger?
A few examples of triggers:
- Something that can be used in many contexts. We want the trigger to be
- Something that is atomic.
- Something that gives feedback. (What actually provides the signal to
stop when you mistype?)
- "Enter", but key-down or key-up?
- Mouse button release
- #, in most telephone interfaces
- Pause, of a particular duration.
What makes a good prompt?
A few examples of prompts:
- Something that says "I'm a prompt."
- Information about what input is legal: the range of possible
- Information about relevant system state, to help the user predict the
result of subsequent actions.
- Menu items displayed when a pop-up menu is summoned
- Rising intonation.
- Change in the tracker icon, such as watch changing to arrow..
Sequential synchronization (synchronization protocols)
The simplest forms of synchronization exist to ensure that things get done in
the right order. Let's look at three forms of synchronization, taken from the
virtual input device models of the GKS graphics standard.
There are three conceptual modes of human/system synchronization
- Request mode, where the system requests input from the user. E.g., Unix.
The process is explicitly sequential.
- Prompt(% ): "Start giving me a request."
- Trigger(<ENTER>): "Here's the request you asked for."
- Event mode, where the user provides the system with things to which it
must respond. E.g., typing to a word processor.
The user doesn't have to wait for the processing of a previous event to
finish before sending a new one. Maintaining and showing the context in
which the new event will be interpreted is not necessarily simple.
- The trigger (key press): "Here's an event to respond to. (Add this
character to the document.)"
- The prompt (insertion point) says, "Here's the context in which I
will evaluate the next event. (Here's the location where I will place
the next character.)"
- Usually it is possible for the prompt to change to something like:
"Slow down the events, please. (I can't format as fast as you can
- Sample mode, where the system tries to follow the user. E.g., free-hand
drawing with controls points appearing, as in Illustrator.
Time is normally an essential part of the system's interpretation of the
- The trigger is generated by the system using its own rules
- The prompt usually shows the mode in which the interface will
handle the input. The representation may be implicit, explicit, or
Note that while the trigger is a useful concept in the first two cases
it's less useful in the third. We can usefully make the following
distinction between two input styles:
- triggered: The user specifies explicitly when the input is valid. In
event mode the user adjusts the value to be measured, with no guarantee
that it has useful intermediate values. The user guarantees that the
measure is useful when the trigger is presented.
- measured: The user makes sure that the input is valid all the time,
because he or she doesn't know when it will be taken. In sample mode the
user provides a measure as a value defined at all times; the system takes
it when convenient.
We may conceptualize synchronization in terms of handshaking, but it is
important to note that the prompt concept actually elides three separate
actions that users may wish to distinguish. Separating them out an interface
transaction can follow the sequence:
- prompt -- the system can accept new input,
- trigger -- input is ready to be accepted,
- echo -- here is the input that was accepted,
- acknowledge -- input has been processed.
In Unix the four stages are easily distinguishable in the Cshell when the
command !! is issued.
% !! # prompt and trigger
/usr/sbin/shutdown -y -i5 # echo
<blank screen> # acknowledgement
One reason for differentiating the three components is that we might want
to reorder them. Why? (Hint. "&" in Unix, which gives a new prompt before
Now, what about context?
The second important aspect of synchronization has to do with the coexistence
of content and context, their interrelationship in creating meaning, and the
necessity of keeping them together.
When the system provides output to the user,
- The system provides content (local).
- The user assesses the output in context, which includes
- other available information (the rest of the screen),
- what happened recently (the user's memory),
- how things work in the world (the user's knowledge).
When the system takes input from the user,
- The user provides content (local).
- The system assesses the input in context, which is the state of the
- earlier elements of the stream of which the input is a part,
- the state of other computations with which the system
Slippage between context and content, (The ATM thinks I'm putting in the
amount of money I want; I think I'm putting in my PIN number.)
gives either user or system incorrect information. How is slippage best
avoided? Give some examples to show that the problem is real.
- the right content in the wrong context, e.g. Netscape has a File menu
containing "Close"; what does it do? Immediately above Netscape's File
menu is the window manager's window title menu, containing "Close"; what
does it do?
- the wrong content in the right context, giving vi commands to an emacs
Solutions depend on continuity, which is takes advantage of the user's
memory. How is continuity broken? The user's continuity is broken by
interruptions (which are, by definition, unexpected),
which cause a change of context in the user. Note that in some cases the user
can change context as slowly as he or she likes, leaving ample time to
arrange context-retrieval; in other cases the response has a time limit,
limiting the opportunity for arranging context-retrieval. How can the user
regain the right context after an interruption?
- The telephone rings.
- A child calls for attention.
- Grep returns nothing.
The system's continuity is broken when changes occur in the system. Such
changes must be drawn to the attention of the user. There are two strategies
for doing so.
- Things lying around that remind the user of
system/application/interface state are important. These depend on
- Cues to change context sometimes should be processed voluntarily so
they can be bypassed when the user already has the context. E.g., "Pull
up, pull up."
- The lower the level at which the cues are processed, the less effortful
they are to the user. But, note the effect of practice on the "Do you
really want to remove this file?" prompt.
As an example, consider strategies for managing focus in window systems.
- Provide specific stimuli to indicate significant changes in state.
These are normally orienting cues, to which the user cannot avoid
responding. E.g., sirens.
- Require the user to initiate any significant change in state. Thus any
change takes place as a result of explicit attention from the user. (An
example is the ubiquitous confirmation box. But what about automatization
of user responses?)
Multi-stream synchronization (windowing systems)
The problem of multiple contexts becomes significant when the human computer
interaction is spread over several tasks and interfaces, as is typical in
modern windowing systems. Error free synchronization of system resources,
screen, mouse, keyboard, memory, CPU, and so on, is hard enough;
synchronizing the user is harder still.
- Window frames and other decoration are provided to allow the user
visually to parse the screen into conceptual objects.
- Content within a single object is taken as related; content in
different objects is taken as unrelated.
- Actions are requested from different conceptual objects, and responses
must be directed to the appropriate object.
- One object is attended by the user; the others are monitored.
- Monitored objects must be able to request user attention. There are
three types of cues:
- ones that compel attention (sirens),
- ones that attract attention from suitably primed users (signs),
- ones that retain freely given attention but do not explicitly
attract attention (sentences).
Real-time Interfaces (multi-media)
Real-time has always been with us: humans operate in real-time. Time-outs
must exist in any interface, but only as a violation of sequential semantics!
What is the difference between spatial and temporal modalities? Each
perceives objects consisting of parts. In a spatial modality the parts can be
visited in any order; in a temporal modality the order is fixed by our
one-way passage through time.
- attribute of users:
- sight, hearing, touch, smell, taste, proprioception for input
- what for output?
- Temporal modalities are sequential, but remember the existence of
- Spatial modalities have limited parallel capabilities. Assemble
percepts at different locations into a single object.
- Attention is unitary: every place in time, sooner or later; a single
place in space.
- attribute of systems:
- CRTs, LCDs, speakers, touch sensors for output
- keyboards, mice, trackballs, eye-trackers for input
- Temporal media are sequential
- Spatial media have limited parallel capabilities, depending on the
ability to provide integral percepts to the user.
- Inter-medium synchronization is very hard to implement, and is almost
always ad hoc. (This is a matter of what is easy and hard to
- channels through which information flows
- attributes of human-system interaction:
- CRT/vision, speaker/audition for computer/human information
- hand/mouse, head position/3D sensor, speech/microphone for
human/computer information flow
- Streams have well-defined temporal properties
- Streams contain information that is related.
- Inter-stream correlation is powerful for human input, wholeness,
- Inter-stream correlation is difficult to control in output directed at
humans, unless it's easy, of course.
Obviously modality and medium, while useful as general terms, are not very
precise. They are useful when attaching characteristics to capabilities: CRTs
have different output characteristics than LCDs; eyes have different input
characteristics than ears. On the other hand they are less useful when we try
to describe interface architecture. For this purpose stream is a much more
useful basic concept.
Using the above terminology it is obvious that windowing systems are
multi-stream systems. They usually have a single output medium, a CRT,
connected to a single modality, vision, and making available several output
streams (windows), interacting with a single attentive (human-)input stream
that is time-multiplexed between windows. They usually have two
(computer-)input streams, mouse and keyboard, both associated with the same
user modality, hand/finger. Control of the (human-)output modality is
time-multiplexed over the two input streams.
Multi-media systems, as the name suggests, are systems that employ more
than one medium for communication with the user. From the user's point of
view it is most significant that different streams contain correlated input.
Users seek out correlation among the many input streams they process.
Correlation is based on simultaneity.
- defined by the user (<100 milliseconds)
- inter-stream concept
- defines the granularity of time
- humans appear not to have much flexibility
The most powerful new interface technique available in multi-media systems is
the ability to have more than one stream of information from a single
conceptual object. To provide this it is necessary for a system to provide
application support for simultaneity with a granularity close to 10
milliseconds, and to maintain this synchronization over long periods of time.
(Streams broken apart are not easily put back together.) We are still waiting
for general solutions to this problem.