Drag-and-Drop Protocol for the X Window System

--------

Introduction

Today, Drag-and-Drop (DND) is considered a requirement for commercial-quality applications. On most operating systems, support for DND is built-in, so everybody uses it and all programs can communicate with each other.

On X, however, there is no standard, so various groups have developed their own protocols, with the result that programs written for one protocol cannot talk to programs written for a different protocol. Clearly this does not satisfy the fundamental requirement that DND allow the user to drag data from any program to any other program.

What is required is a single protocol that everybody can use so all programs can exchange data via DND. (The X Selection mechanism ensures that everybody can exchange data via the clipboard.)

The basic requirements for such a protocol are that it provide visual feedback to the user during the drag and that it allow the target to choose whatever data format it prefers from among all the formats that the source can provide. In addition, it must be efficient so that the visual feedback does not lag behind the user's actions, and it must be safe from deadlock, race conditions, and other hazards inherent in asynchronous systems.

Current version: 4
Last updated on July 30, 2010

Comparison to other DND protocols

Supporters of the XDND protocol

Other related protocols

Addenda to the main XDND protocol:


Definitions

The source is the window that will supply the data.

The target is the window that the cursor is on top of and which will receive the drop if the mouse is released.

You should be familiar with the X Selection mechanism described in the Xlib manuals: Volume 0, Appendix L and Volume 1, Chapter 10.

All data types and actions are referred to by their corresponding X Atoms. The atom names of the data types are the corresponding MIME types, in all lower case. (RFC's for MIME: 2045, 2046, 2047, 2048, 2049)

All constants mentioned in this document are the string names of X atoms, capitalized as shown. This avoids the need for hard-coded values, which would require a global registry.


Example walk-through

Note: Parenthesized numbers in bold-face are the number of packets sent to or from the server.

Step 0:

Windows announce that they support the XDND protocol by creating a window property XdndAware that contains the highest version supported.

Step 1:

When a drag begins, the source takes ownership of XdndSelection.

When the mouse enters a window that supports XDND (search for and retrieve window property: avg 8), the source sends a ClientMessage of type XdndEnter (2) which contains the protocol version to use (minimum of source and target versions) and the data types supported by the source.

An extension has been developed to allow dropping on other windows.

Step 2:

The target receives XdndEnter.

The ClientMessage only has space for three data types, so if the source supports more than this, the target must retrieve the property XdndTypeList from the source window in order to get the list of available types. (2)

Step 3:

The source sends a ClientMessage of type XdndPosition. (2) This tells the target the position of the mouse and the action that the user requested. It may also include a request to scroll the target.

Step 4:

The target receives XdndPosition.

The target window must determine which widget the mouse is in and ask it whether or not it will accept the drop. For efficiency, the target window should keep track of whether or not the widget will accept the drop and only ask again if the action changes or the mouse enters a different part of the widget. Once the widget has said that it will accept the drop and as long as the action remains the same and the mouse remains in the same part, the widget gets all the XdndPosition messages so that it can re-draw itself to show the user where the data will be inserted, if appropriate.

To determine whether or not it can accept the drop, the target widget consults the list of types from the XdndEnter message and the requested action from the XdndPosition message.

If the target cannot perform the requested action, it can return either XdndActionCopy or XdndActionPrivate. If neither of these are possible, then it should refuse the drop.

If the target needs to look at the data itself, it calls XConvertSelection() for XdndSelection, the data type that it is interested in, and the given time stamp. (7) It can do this more than once, if necessary.

If the target can accept the drop, it should hilight its border or otherwise notify the user visually. If it retrieved the data, it should cache it so it does not need to be retrieved again when the actual drop occurs.

Regardless of whether or not the target can accept the drop, it should process the request to scroll, if any, because scrolling may expose a drop zone.

Step 5:

The target sends a ClientMessage of type XdndStatus. (2) This tells the source whether or not it will accept the drop, and, if so, what action will be taken. It also includes a rectangle that means "don't send another XdndPosition message until the mouse moves out of here".

Step 6:

The source receives XdndStatus. It can use the action to change the cursor to indicate whether or not the user's requested action will be performed.

When the mouse moves out of the given rectangle, go to Step 3.

XdndPosition messages are normally triggered by MotionNotify events. However, if the mouse moves while the source is waiting for an XdndStatus message, the source has to cache the new mouse position and generate another XdndPosition message as soon as it receives the XdndStatus message. (This will be necessary when the server-target connection is much slower than the server-source connection.)

It is important to note that if the target does not return a "no more messages" rectangle, then sending periodic XdndPosition messages when the cursor is not moving is necessary to allow the target to scroll smoothly, e.g., when the cursor is in a scroll zone close to the edge of the target. If this is not done, the user will be forced to wiggle the cursor back and forth in order to tickle the target so it can scroll.

Step 7:

If the mouse leaves the window, the source sends a ClientMessage of type XdndLeave. (2)

If the mouse button is released in the window, the source waits for the last XdndStatus message (if necessary) and then sends a ClientMessage of type XdndLeave or XdndDrop, depending on the "accept" flag in the last XdndStatus. (2)

If the source never received any XdndStatus messages at all, it should send XdndLeave without waiting.

If the source doesn't receive the expected XdndStatus within a reasonable amount of time, it should send XdndLeave. While waiting for XdndStatus, the source can block, but it must at least process SelectionRequest events so the target can examine the data.

Step 8:

If the target receives XdndLeave, it frees any cached data and forgets the whole incident.

If the target receives XdndDrop and will accept it, it first uses XConvertSelection() to retrieve the data using the given time stamp (if it doesn't already have it cached). (7) It then uses the data in conjunction with the last action and mouse position that was acknowledged via XdndStatus. (Over a slow network, this makes the drop location consistent with the visual feedback given to the user.) Once it is finished, it sends XdndFinished.

If the target receives XdndDrop and will not accept it, it sends XdndFinished and then treats it as XdndLeave.


Theory

Every part of this protocol serves a purpose:

XdndAware

In order for the user to be able to transfer data from any application to any other application via DND, every application that supports XDND version N must also support all previous versions (3 to N-1). The XdndAware property provides the highest version number supported by the target (Nt). If the source supports versions up to Ns, then the version that will actually be used is min(Ns,Nt). This is the version sent in the XdndEnter message. It is important to note that XdndAware allows this to be calculated before any messages are actually sent.

It is also critical for coexisting with other DND protocols (since one can try something else if XdndAware is not present) and is useful for debugging since it lets one check the target's XDND version, after which one can expect to receive an XdndStatus message.

X Selection

By using XConvertSelection(), one can use the same data conversion code for both the Clipboard and Drag-and-Drop. This is an especially large saving if the target requests the type "MULTIPLE" or if the source is forced to send the data incrementally (type "INCR"). It also makes checking the data independent of the main sequence of messages, so XdndStatus correctly reports "yes" or "no" the first time.

By using XdndSelection, the dropped data doesn't interfere with the clipboard data stored in XA_PRIMARY.

Using XConvertSelection() does have the problem that the user can begin dragging something else before the data transfer is complete. However, the X clipboard has the same problem, so this doesn't impose any additional constraints on the user, and it can be avoided as explained below in the discussion of the XdndFinished message.

Actions

Specifying the action separately from the data type allows one to avoid defining N×M atoms for N data types and M actions. Since the user must have full control of what will happen, exactly one action is specified by the source. This is passed in the XdndPosition message to allow it to change during the drag. (e.g. if the user presses or releases modifier keys) The action accepted by the target is passed back in the XdndStatus message to allow the source to provide feedback in the cursor.

The special action XdndActionAsk tells the target that it should ask the user what to do after the drop occurs. This allows one to implement the effect obtained by right-dragging in Windows95®, where the file manager asks the user whether to move, copy, create a link, or cancel. The list of actions is retrieved from the XdndActionList property, and the description of each action that is shown to the user is retrieved from the XdndActionDescription property, both on the source window. Note that the user should be asked before retrieving the data and thus also before sending XdndFinished.

The special action XdndActionPrivate tells the source that the target will do something that the source doesn't understand and that won't require anything from the source other than a copy of the data.

Messages

The XdndEnter message initiates the session and gives the target a chance to set up local variables such as the transformation from root window coordinates to target window coordinates. It also provides a list of supported data types so the target doesn't have to call XConvertSelection() for XdndSelection, TARGETS.

The XdndPosition message provides mouse locations so that the target does not have to query the X server in order to redraw itself properly. There is no other reliable way for the target to get the mouse location because X will force the cursor to be grabbed by the source window, so only the source window will be receiving events. The target needs the mouse location because it has to update itself to show where the data will be inserted. This is especially important in text editors, spreadsheets, and file managers.

The time stamp in the XdndPosition message must be passed to XConvertSelection() to ensure that the correct data is received.

The scroll request in the XdndPosition message provides one way to scroll the target. Ideally, the raw state of the X event should be sent to the target to allow it to behave exactly the same way as if the user were not performing DND. However, since the source must unilaterally decide which X events should trigger a scroll request, the source and target must agree on the same trigger events. The de facto standard triggers are the ButtonPress events from the mousewheel: buttons 4,5,6,7. This is why the XdndPosition message includes both the index of the button and the complete state of all the modifier keys.

All targets should also provide scroll zones along the edges to allow users without a mousewheel to scroll the target.

The XdndStatus message provides feedback to the source (e.g. it might want to change the cursor) and ensures that XdndPosition messages do not pile up when the network connection is slow.

The XdndLeave message cancels the session.

The XdndDrop message tells the target to proceed with the drop. The time stamp must be passed to XConvertSelection() to ensure that the correct data is received.

The XdndFinished message tells the source that the target is done and no longer needs the data. This allows the source to implement any one of three different behaviors:

Protecting against malfunctioning programs

If the version number in the XdndEnter message is higher than what the target can support, the target should ignore the source.

While the source and target are receiving XDND messages from each other, they should ignore all XDND messages from other windows.

If either application crashes while DND is active, the other application must avoid crashing on a BadWindow error. The only safe way to do this is to actually catch the error by installing an error handler with XSetErrorHandler(). In addition, the target must also listen for DestroyNotify events so that it doesn't wait forever for another XdndPosition if the source crashes between receiving XdndStatus and sending XdndPosition.

As discussed above, the source must be careful to avoid locking up if the target does not send XdndFinished.


Notes

All targets should provide scroll zones along the edges to allow users without a mousewheel to scroll the target.

We are collecting examples to show DND might work in various cases. Dragging text is straightforward since there are several well-known formats, including text/plain and text/enriched. Dragging files is at least equally important.

We have also developed extensions that allow dropping on the root window and on windows that do not support XDND.

XdndActionLink only makes sense within a single program or between cooperating programs because the target must obtain not only the data, but also the location of the data. In all other cases, the target should respond with XdndActionCopy or XdndActionPrivate in the XdndStatus message.

On the other hand, XdndActionAsk makes equally good sense between unrelated programs and cooperating programs. However, when the source and target are unrelated, the target may choose to provide a list of actions that it can perform on its own after retrieving the data instead of asking the source for a list of actions.

Remember also that you must debounce the mouse drag. If the user moves the mouse by only a couple of pixels while clicking to select something, then it is far more likely that the user is a bit clumsy than that the user intends to start a drag. A threshold of 3 pixels is typical.

Getting up on my soap box...

In my opinion, programs should not change the cursor during the drag because this provides the user with the most consistent picture. The user is always dragging the same data, regardless of whether or not the current target will accept it. It is the target that should change to show whether or not it will accept the drop.

However, if you want to be Microsoft compliant, you have to change the cursor. As usual, Microsoft got it backwards...

As a side note, on page 253 of his book, About Face, Alan Cooper agrees wholeheartedly.

The single exception that I endorse is adding a small symbol to the cursor to show that the requested action will be performed, instead of XdndActionPrivate. For an example, refer to the page on dragging files.


Optimizations

When the source and target windows are part of the same application, sending X Client Messages is a waste of time and bandwidth, especially if the program and server are on different machines. Implementations should therefore detect the special cases of "source = target" and "source and target in same application" and handle them via function calls.

To avoid calling XConvertSelection() in the above cases:

Targets do not have to retrieve XdndTypeList from the source window if they find what they are looking for in the three types listed in the XdndEnter message.

To avoid unnecessary messages from the source to the server, one should only change the cursor when the target or status (acceptance and action) changes.

Unfortunately, one cannot avoid calling XTranslateCoordinates() continuously, because of overlapping windows.


Technical details

Current version: 4

All constants mentioned below are the string names of X atoms, capitalized as shown. This avoids the need for hard-coded values, which would require a global registry.


Atoms and Properties

XdndAware

This window property must be of type XA_ATOM and must contain the highest version number of the protocol supported by the target. (Version numbers start at zero. The maximum version number is 0xFF because there is only one byte allocated for it in the XdndEnter message. At one new version every three months, which is very rapid turnover, this will last 64 years.)

The property must be set on each top-level X window that contains widgets that can accept drops. (new in version 3) The property should not be set on subwindows. The target must dispatch the messages to the appropriate widget. Since window managers often insert extra layers of windows, this requires searching down the subwindow tree using XTranslateCoordinates().

XdndSelection

This is the name of the X Selection that is used when the target wants to examine the data during the drag phase and when it wants to retrieve the data after a drop.

Data types

All data types are referred to by their corresponding X Atoms. The atom names are the corresponding MIME types, in all lower case. (RFC's for MIME: 2045, 2046, 2047, 2048, 2049)

For text, the charset attribute can be appended to the MIME type. (e.g. Japanese => text/plain;charset=ISO-2022-JP) If the charset attribute is not specified, it is assumed to be ISO-8859-1. (new in version 4)

Note that any data type may be transferred via the INCR protocol.

XdndTypeList

If the source supports more than 3 data types, this window property must be set on the source window, must be of type XA_ATOM, and must contain a list of all the supported data types.

Actions (new in version 2)

All actions are referred to by their corresponding X Atoms. The predefined actions are

The XdndAction prefix is reserved for future expansion, but any other name can be used for other actions as long as both the source and the target recognize it and agree on what it means. The predefined atom None is not allowed as an action.

The default is XdndActionCopy, and this is assumed to be the action when using version 0 or 1.

In general, XdndActionMove is implemented by first requesting the data and then the special target DELETE defined in the X Selection protocol. (File managers will obviously just use mv or its equivalent.) DELETE should be sent before XdndFinished.

Refer to the Theory and Notes sections for more information.

XdndActionList (new in version 2)

If the source sends XdndActionAsk, this window property must be set on the source window, must be of type XA_ATOM, and must contain a list of all the supported actions. The first one should be the default so the user doesn't have to change the selection in the radio group too often.

XdndActionDescription (new in version 2)

If the source sends XdndActionAsk, this window property must be set on the source window, must be of type XA_STRING, and must contain a list of ASCII strings separated by NULL's that the target should display when describing the choices to the user. These strings must be in the same order as the atoms in the XdndActionList property.

The option to cancel the operation must always be provided in the dialog displayed to the user, via a Cancel button, but should not be included in XdndActionList.

XdndProxy (new in version 4)

If this window property exists, it must be of type XA_WINDOW and must contain the ID of the proxy window that should be checked for XdndAware and that should receive all the client messages, etc. In order for the proxy window to behave correctly, the appropriate field of the client messages, window or data.l[0], must contain the ID of the window in which the mouse is located, not the proxy window that is receiving the messages. The only place where the proxy window should be used is when checking XdndAware and in the calls to XSendEvent().

The proxy window must have the XdndProxy property set to point to itself. If it doesn't or if the proxy window doesn't exist at all, one should ignore XdndProxy on the assumption that it is left over after a crash.


Client Messages

Note: All unused flags must be set to zero in every message. This allows one to define new flags without incrementing the version number.

XdndEnter

Sent from source to target when the mouse enters a window that supports XDND.

XdndPosition

Sent from source to target to provide mouse location.

XdndStatus

Sent from target to source to provide feedback on whether or not the drop will be accepted, and, if so, what action will be taken.

XdndLeave

Sent from source to target to cancel the drop.

XdndDrop

Sent from source to target to complete the drop.

XdndFinished (new in version 2)

Sent from target to source to indicate that the source can toss the data because the target no longer needs access to it.


Sample implementation

If you are interested in supporting this protocol, but daunted by having to start from scratch, you can obtain sample C++ code via anonymous ftp. This code may be used under any license.

Paul Sheer has implemented XDND v2 in C as a generic library. The files are xdnd.c and xdnd.h

Edward Rosten also has a sample implementation and debugger.

If you use a different language, please consider donating your code for others to look at.

We also have a binary that you can use to test interoperability between a correct implementation and your implementation.

While implementing this protocol, you may find it very useful to use the programs xlsatoms to list all the atoms that the server knows about, xprop to list all the properties on a particular window, and xscope to study the timing of events.

Even if you implement XDND from scratch, we would appreciate it if your distribution included some sort of documentation that states clearly that you are supporting this protocol and provides a reference to this web page. This will help get the snowball rolling. The more programs that support the same protocol, the more useful Drag-and-Drop will be for the users. If you tell us that you support the protocol, we will also add you to the list of supporters.


Changes from previous versions

July 30, 2010

Removed note about using XdndAware as an initial filter. Since it must be set only on top level windows, it is too complicated to keep track of all the data types that all the child windows accept.

June 17, 2009

Added scrolling to XdndPosition. Changed recommendation from a hysteretic scroll zone outside a widget to simple scroll zones inside a widget.

May 21, 2009

Added note on the importance of sending periodic XdndPosition messages when the cursor is not moving. Without this, the target cannot scroll smoothly.

December 25, 2002:

Version 5 adds additional information to the XdndFinished message in order to allow the Java DND API to be implemented on top of XDND. (Thanks to Danila A. Sinopalnikov for working out the details.)

August 31, 2002:

Added requirement to File Dragging protocol. The user name must be provided via the new text/x-xdnd-username data type.

November 21, 2000:

Thanks to Daniel Biddle for pointing out that the MIME types should be specified in the format ISO-8859-1, not iso8859-1.

October 26, 2000:

Minor modifications to actions required by File Dragging protocol.

February 22, 2000:

Added additional notes about why the host name must always be included when dragging files.

June 7, 1999:

Version 4 adds XdndProxy window property to support the new protocol for dropping on the root window.

Rewrote the protocol for dropping on the root window to conform to the latest implementations.

Added note in Technical Details section about how to specify the character set for text.

Created Direct Save (XDS) protocol layered on top of XDND.

December 1, 1998:

Rewrote the protocol for dropping on the root window to conform to the latest implementations.

September 19, 1998:

The File Dragging protocol now uses the well established text/uri-list instead of url/url.

Added note to Optimization section about caching the window stack.

September 7, 1998:

Version 3 changes the way XdndAware is handled. To reduce the number of XTranslateCoordinates() calls to the X server and to make life easier for programs that use multiple X windows per widget, XdndAware must now be placed on the top-level X window, not on subwindows. This change is unfortunately incompatible with previous versions. However, since there are still relatively few programs that have been released with XDND support, the specification has simply been adjusted so XDND compliance only requires supporting version 3 and up. (This will never happen again!)

In addition, Jeroen van der Zijp has invented an extension that allows dropping onto windows that do not support XDND, and Arnt Gulbrandsen has developed a way to drop on the root window.

August 17, 1998:

Version 2 adds the following features: With these new features, the File Dragging protocol becomes much simpler.

Added page of supporters.

February 25, 1998:

Added examples of how XDND fits into the larger picture of cooperating applications.

February 24, 1998:

Added trashcan discussion to the File Dragging protocol.

February 2, 1998:

Version 0 has a hole. If the user begins a second drag from the same source before the data has been transferred to the first target (over a really slow network, obviously), then the first target may get the wrong data. Thanks to Donal K. Fellows for pointing this out.

Version 1 fixes this by adding a time stamp to XdndPosition and XdndDrop which must be used when requesting the data. This way, if the user quickly begins a second drag, the first target will at least get no data instead of the wrong data.

Please refer to the Theory section for a more complete discussion and the reasons why this was not fixed by adding a "drop finished" message.

January 28, 1998:

Added comparison to Xde.


Acknowledgements

This protocol was developed by John Lindal at New Planet Software, with help from Glenn Bach and lots of feedback from Arnt Gulbrandsen at Troll Tech and Owen Taylor of GTK+ and input from Elliot Lee at Red Hat Software.