Look Ma, No Mouse: Keyboard Navigation and Shortcuts in the Zimbra Collaboration Suite and The Kabuki Ajax Toolkit

By Ross Dargahi on September 12, 2006 in Zimbra Web Client

Keyboard shortcuts and navigation are indispensable time savers for an application’s frequent and power users. Who among us has not quickly learned the keyboard shortcuts for performing common tasks within an application that we use regularly?

Unfortunately, one of the more common complaints against web-based applications is that they are, in many instances, lacking when it comes to supporting keyboard shortcuts and navigation. Frankly, the lack of keyboard support (both shortcuts and navigation) is something that has annoyed me about the Zimbra Collaboration Suite (ZCS), and it is something that I have been wanting to add to both Kabuki (the Zimbra Ajax Toolkit) and the ZCS; however, I didn’t want to just stuff in some basic keyboard shortcuts and call it a day. Instead, I wanted to have a go at implementing the same level of keyboard support that most traditional UI toolkits and desktop applications support. This actually turned out to involve quite a bit of work – for example, since only input elements may have focus (in most browsers) we needed to simulate and track focus for toolkit components which frequently have no native input element associated with them – but at the end of the day, I think it was worth the effort.

So the good news is that Kabuki now provides a pretty rich keyboard model that in many ways approximates the level of support found in more traditional UI toolkits. Specifically, this includes support for a canonical focus model, customizable key bindings in UI widgets (not all of them just yet), as well as full support for tab group hierarchies for navigating native and toolkit visual components in an orderly fashion via the tab and arrow keys. Conrad (one of Zimbra’s lead Ajax architects), took this work to the next level by implementing keyboard navigation and shortcuts throughout the ZCS and by improving the framework itself. As a result the ZCS now has significant keyboard support in version 4.0, about 80% coverage at the moment, and we are working towards making that even higher. Future work also includes normalized mappings among browsers (very high on our list), I18N support, and supporting custom user-defined bindings.

The bottom line is that the ZCS now supports some pretty spiffy keyboard actions (over 100 of them) including:

Navigating among the various ZCS apps
Navigation and selection within a list of items
Composing and sending an email message
Creating calendar appointments
Switching among calendar views
Creating new tags, folders, calendars, etc
Popping up the Zimbra Assistant
Changing views
Popping up, interacting with, and dismissing context menus

You can find a more complete list of supported keyboard events on the Zimbra Wiki

Now onto some of the more technical details on Zimbra’s keyboard support.

The keyboard management infrastructure is implemented in the DHTML Widget Toolkit (DWT), which is the Kabuki Ajax toolkit’s UI framework. DWT consists of a component model, numerous widgets, an event model, drag and drop infrastructure, and now a keyboard shortcut and navigation model. DWT is loosely modeled after SWT. There are several elements that compose DWT’s keyboard management framework:

Key Maps

A key map is a set of key bindings. A key binding maps a key sequence to an action. For example, I may decide that “Ctrl+U” marks an email message as unread, or I may decide that a multi-key sequence such as “Ctrl+N” followed by the letter “M” will create a new mail message (Note that by default, there is a 750ms timeout between keys for a multiple-key sequence, though this is configurable via the DwtKeyboardMgr class described later).

DwtKeyMap is the base class for key maps and provides bindings for DWT widgets. Below is a snippet from DwtKeyMap’s constructor showing the key bindings for some of the widgets. Note how key sequences bind to the symbolic constants representing keyboard actions:

function DwtKeyMap() {
[snip...]
this._map["DwtDialog"] = {
"Enter":  DwtKeyMap.ENTER,
"Esc":    DwtKeyMap.CANCEL
};
this._map["DwtButton"] = {
"Enter":        DwtKeyMap.SELECT_CURRENT,
"ArrowDown":    DwtKeyMap.SELECT_SUBMENU
};
this._map["DwtListView"] = {
"Space":             DwtKeyMap.SELECT_CURRENT,
"Ctrl+Space":        DwtKeyMap.ADD_SELECT_CURRENT,
"Ctrl+`":            DwtKeyMap.ADD_SELECT_CURRENT, // Mac FF
"ArrowDown":         DwtKeyMap.SELECT_NEXT,
"Shift+ArrowDown":   DwtKeyMap.ADD_SELECT_NEXT,
"Ctrl+ArrowDown":    DwtKeyMap.NEXT,
"ArrowUp":           DwtKeyMap.SELECT_PREV,
"Shift+ArrowUp":     DwtKeyMap.ADD_SELECT_PREV,
"Ctrl+ArrowUp":      DwtKeyMap.PREV,
"Ctrl+A":            DwtKeyMap.SELECT_ALL,
"Home":              DwtKeyMap.SELECT_FIRST,
"End":               DwtKeyMap.SELECT_LAST,
"Enter":             DwtKeyMap.DBLCLICK,
"Comma":             DwtKeyMap.ACTION,
"Shift+Comma":       DwtKeyMap.ACTION,
"Ctrl+Enter":        DwtKeyMap.ACTION,
"Ctrl+M":            DwtKeyMap.ACTION  // Mac FF
};
this._map["DwtMenu"] = {
"Esc":          DwtKeyMap.CANCEL,
"Enter":        DwtKeyMap.SELECT_CURRENT,
"ArrowDown":    DwtKeyMap.SELECT_NEXT,
"ArrowUp":      DwtKeyMap.SELECT_PREV,
"ArrowLeft":    DwtKeyMap.SELECT_PARENTMENU,
"ArrowRight":   DwtKeyMap.SELECT_SUBMENU
};
[snip...]
};

Applications may inherit from the above class and add their own maps or override existing ones (for example when subclassing widgets). Note that component and application authors don’t need to worry about key sequences, rather they just need to implement the actions that their components support. The keyboard management framework takes care of handling the mapping from key sequences to actions. Once a key sequence has been mapped to an action, the action is passed to the handleKeyAction() method defined by the component (See the sections of DwtKeyboardMgr and DwtControl below for more details).

Decoupling key bindings and actions makes it easy to change the key binding for a given action, or to allow multiple key bindings for that action. Key maps may inherit (including multiply) from other key maps. This allows for extensions as well as application-wide (or default) key maps.

The sample code below shows a portion of the ZCS’s key map. Notice how the ZmMailListController inherits from the Global key map.

function ZmKeyMap() {
this._map["Global"] = {
"`":       ZmKeyMap.ASSISTANT,
"Shift+`": ZmKeyMap.ASSISTANT,
[snip...]
}
this._map["ZmMailListController"] = {
"INHERIT":  "Global",
"R":       ZmKeyMap.REPLY,
"A":       ZmKeyMap.REPLY_ALL,
"R,S":     ZmKeyMap.REPLY,
"R,A":     ZmKeyMap.REPLY_ALL,
[snip...]
}
[snip...]
}

Key map entries may consist of:

Single keys e.g. “Enter” or “M”
Single keys plus one or more modifiers e.g. “Ctrl+M” or “Ctrl+Shift+M”
Multiple key sequences e.g “M,U” or “M,U,A”
Multiple key sequences plus modifiers e.g. “Shift+M, U” or “Shift+M, Shift+U, A”

We are currently working on a serialization/deserialization interface so that key maps may be serialized to and deserialized from a textual representation. This will help in with localizations and custom/user defined keymaps, and will do away with the need for the hash tables shown above.

Tab Groups

Tab groups permit the definition of a hierarchical keyboard navigation model. A tab group is a tree structure where the intermediate nodes in the tree are other tab groups and the leaf nodes are focusable components i.e. DWT widgets and/or focusable native HTML elements such as input fields. Tab groups represent the order in which components that the user sees on the screen are traversed via the keyboard. The tab group hierarchy (or tree) is traversed “in order” by the user pressing the tab key, or in reverse order when the user typically presses the shift key plus the tab key.

There is a special tab group called the root tab group. A root tab group has no parent and represents the keyboard navigation order for the components in a given application view. There can be multiple root tab groups within an application, e.g. an email message list view tag group, a calendar new appointment tab group, or a dialog tab group; however, only one root tab group may be active at any given time.

As will be described in the section on DwtKeyboardMgr, tab groups may be “pushed to” and “popped from” the tab group stack. For example, when a dialog is popped up, its corresponding tab group is pushed onto the tab group stack and becomes the active tab group while that dialog is activated. When the dialog is popped down, its tab group is also popped off the tab group stack bringing the underlying tab group (i.e. the one for the active view) back into play, so that the component that had focus prior to the dialog being activated will once again have focus.

Tab groups are implemented by the DwtTabGroup class. That class provides the API for manipulating tab groups. Here are some of the member methods:

addMember(index) – Adds a member to the tab group at index position, or at end if index is omitted
addMemberBefore(member) – Adds a member to the tab group positioned before another member
addMemberAfter(member) – Adds a member to the tab group positioned after another member
blockDefaultHandling(block) – Blocks the default handler from being invoked for this tab group
contains(member) – Checks to see if an item is a member of this tab group
removeMember(member) – Removes a member
replaceMember(old, new) – Replaces a member with another one
newParent(newParent) – Sets a new parent for this tab group
getFocusMember() – Gets the current focus member
setFocusMember(member) – Sets the current focus member
getNextFocusMember() – Gets the next focus member
getPrevFocusMember() – Gets the previous focus member
resetFocusMember() – Resets the focus member to the first available member

DwtKeyboardMgr

DwtKeyboardMgr is the engine that drives the keyboard management framework. It is responsible for intercepting key events generated by the browser, mapping them to actions (via the registered keymaps), and then dispatching the action to the correct component. In addition, DwtKeyboardMgr is responsible for enforcing the tab order specified by the currently active tab group. Finally, this class also has the machinery that handles multi-key keymap entires by utilizing a timeout mechanism for resolving such sequences.

When dispatching actions, DwtKeyboardMgr will first try and resolve a key sequence with the component that has focus (be it a DWT widget or a native component such as an HTML input element). If the component does not have an action binding for a key sequence (an entry in its key map for the current key sequence), then DwtKeyboardMgr checks to see if a default handler has been pushed on its default handler stack (via the pushDefaultHandler method). If one has been pushed, then DwtKeyboardMgr will query it for the name of the keymap that should be used to resolve the key sequence to an action code. If an action code exists in the key map for the key sequence, then DwtKeyboardMgr will dispatch it to the default handler.

A default handler is analogous to the root tab group. It provides a context for key bindings that are not tied to a specific widget. In a way, the default handler can be thought of as providing a global context. For example, if a button has focus when the character Esc is typed, DwtKeyboardMgr will first give the button a chance to handle the event. If the button does not handle Esc (which is the case), then DwtKeyboardMgr hands the event to the current default handler. So far, there are only two default handlers used within ZCS: an application-level default handler and a dialog-level default handler.

A default handler must implement the following interface:

getKeymapNameToUse() – This method is called by DwtKeyboardMgr to get the name of the key map that should be used for resolving a key sequence. Of course this is application-specific and may change as the user navigates an application. For example, in the case of the ZCS, a different key map may in play for the calendar application as opposed to the email application.
handleKeyAction(action) – Where the action formal parameter is the action constant bound to the key sequence in the key map. This method is where the real work gets done. Its implementation will perform the work necessary to complete the action. It will return true if it handled the action, or false if it did not.

As previously mentioned, DwtKeyboardMgr is also responsible for enforcing the component tab ordering specified by the currently active root tab group – i.e. the one that is on top of the tab group stack. The way this works is that upon intercepting the tab key, DwtKeyboardMgr will call the DwtTabGroup’s getNextFocusMember() method. This method returns the next focusable component in the tab group hierarchy (be it a DWT widget or a native HTML input field). DwtKeyboardMgr will then call the blur() method on the previously focused component, followed by calling the focus() method on the component returned by getNextFocusMember(). Note that I intentionally paralleled the native HTML input element focus() and blur() methods within DWT so there is a canonical way of managing the focus state of any UI component (native or DWT). The combination of DwtKeyboardMgr and DwtTabGroup handles all corner cases such as skipping over components that are not enabled (i.e. grayed out), as well as dealing with scenarios such as the currently focused component becoming disabled.

The following lists some of the more commonly used methods provided by DwtKeyboardMgr:

enable(enable) – Enables or disables keyboard event handling
isEnabled() – Returns true if keyboard event handling is enabled, else returns false
grabFocus(element) – Causes element to grab focus – where element is a DWT widget or an HTML input element
pushDefaultHandler(hdlr) – Pushes a default handler onto the handler stack
popDefaultHandler()- Pops the default handler off the top of the handler stack
pushTabGroup(tabGroup) – Pushes a tab group onto the tab group stack and makes it the active tab group
popTabGroup() – Pops the tab group that is on top of the tab group stack
registerKeyMap(keyMap) – Registers a key map
setKeyTimeout(timeout) – Sets the maximum time to allow between key presses for a multi-key key sequence
setTabGroup(tabGroup) – Replaces the current tab group with the one provided

DwtControl

DwtControl is the base DWT class from which all widgets ultimately inherit behaviour. DwtControl has a number of responsibilities including hooking into the drag and drop system, basic mouse event handling, and integration with the keyboard shortcut and navigation model.

DwtControl exports public focus() and blur() methods that parallel the native HTML input elements so that there is a canonical set of methods for setting and blurring focus across native and DWT elements. DwtControl also declares four methods that widget authors must implement in order to support keyboard management:

_focus() – This method is called when a control receives focus. Its implementation should provide visual feedback that the control has gained focus (e.g. by drawing a border around the component)
_blur() – This method is called when a control loses focus. Its implementation should provide the visual feedback that the control has lost focus (e.g. by hiding the border around a control)
handleKeyAction(actionCode) – This method is responsible for implementing supported actions. The keyboard framework passes in the actionCode associated with a key sequence in the control’s keymap. This method returns true if the control handled the actionCode, else it returns false

Using The Keyboard Framework in an Application

An application programmer wanting to use the keyboard management frameworks typically needs to perform the following steps:

Create a key map class for the application that inherits from DwtKeyMap. Define any application and/or custom widget key maps in this class.
Implement one or more default handlers for the application should they be required. Recall that the default handler is called when a visual component does not have a action code binding for a key sequence. Depending on the complexity of the application, multiple handlers may be pushed and popped as the user interacts with the application.
Create any tab group(s) that may be required. Note that tab groups may be created and manipulated during the application lifecycle.
Instantiate DwtKeyboardMgr.
Register the application’s key map with DwtKeyboardMgr via the registerKeyMap() method.
Push the currently applicable default handler via DwtKeyboardMgr’s pushDefaultHdlr() method.
Push or set the current tab group via the pushTabGroup() or setTabGroup() method.