Mozilla position: Requirements for a framework for mixed-namespace documents

This draft lists a subset of the requirements for mixed-namespace documents to achieve interoperability on the Web. The Web involves multiple user-agents and many authors and authoring tools. This means that if a new technology is to be deployed without numerous cycles of bug reporting and fixing on released products, the relevant specifications must clearly define the behavior of all documents and the test suites must test nearly all of the behavior needed for high levels of interoperability.

The task of meeting these requirements may be split between a compound document framework specification and the specifications of the document formats being combined. In some cases meeting them may require that the framework specification define ways to write document format specifications that allow clearer description of compound document behavior. However, for those document formats that already exist, it should be the job of the framework specification to define behavior, at least for the existing versions of such specifications that are not subject to such substantial revision.

The framework should be designed for the long term. Many Web documents should still be around five, ten, fifty, or a hundred years from now. Web user agents of the future should not have to deal with vastly different requirements and architectures from the document format specifications over different time periods, especially when they can be mixed and cannot be distinguished programmatically. Rather, formats should evolve, building on what already exists to add new capabilities. Designing for the long term implies at least the following:

Format-independence

This framework specification should be separate from any specification requiring a specific document format language or any specific new element, attribute, or interface. This is important so that it can become the foundation for multi-namespace presentable documents past the lifetime of a particular profile or if the profiles proposed do not succeed in the market.

Backwards compatibility

Designing for long-term persistent content requires maintaining backwards compatibility, including for content that already exists on the Web. Attempting to change or replace the current content standards used on the Web will reduce the ability of current content to be read in the future. It will also set a precedent for allowing similar change in the future, which will reduce the chances that content written to these new standards will be readable further in the future.

Designing for backwards compatibility also makes it much easier for authors to start using compound documents, since they can add use of additional document format vocabularies to existing documents.

Interoperability

Much of the work involved in specifying the behavior of compound documents involves describing the presentation and behavior of the presentation tree, i.e., the tree that is the result of processing XML, XSLT, XInclude, XBL, and other specifications. The problem of interoperable presentation and behavior given a set of inputs can be separated into interoperable construction of a presentation tree given an input document and associated resources (XBL bindings, XSLT transformations, DTDs, XML Schema, etc.), interoperable behavior given a presentation tree and additional resources (images, CSS style sheets, etc.), and interoperable behavior independent of a presentation tree (e.g., mechanisms for using DOM interfaces).

Note that interoperable presentation and behavior does not mean identical presentation and behavior in response to a given set of inputs. Behavior may still vary depending on device characteristics, user preferences, or user-agent defaults for user preferences.

Presentation tree

The framework must ensure that the presentation tree is the same for any given input document (and associated style sheets, schemata, bindings, etc.) for any given set of supported standards that affect the presentation tree (i.e., for any profile of the framework). To do this, it may need to refer to a specification produced by the XML Processing Model Working Group that describes how the presentation tree is created.

DOM objects

DOM document objects

The framework must define, given user-agent support for a set (potentially defined by a profile of the framework) of document format specifications and their DOM APIs (such as HTML and SVG), and a document of a given MIME type that uses some set of these standards, which interfaces are implemented by the DOM document object representing the document.

Note that the simplest solution to this problem is to say that all document interfaces supported by the user-agent are implemented by the document object.

DOM element objects

The framework should perhaps define what happens to DOM interfaces when nodes are imported into another namespace. For example, do the XForms elements imported into the XHTML2 namespace implement the HTMLElement interface (or equivalent for XHTML2)? An answer of no would likely be much easier for many types of implementations. (Also consider SMIL animation elements imported into SVG.)

Mixing of elements

The basic idea of compound documents is mixing elements from different namespaces within the presentation tree. Thus, many of the requirements relate to mixing elements in different namespaces. (However, to achieve interoperability, specifications also need to meet these requirements in their description of behavior within namespaces.)

Restricted handling of subtrees

Many document format specifications define cases in which entire subtrees should be treated differently than they otherwise would. For example, SVG says that elements not in the SVG namespace should be ignored unless they are inside of svg:foreignObject, HTML says that elements inside an html:object element that loaded successfully or a html:noframes element in a user agent that supports frames should be ignored, and the 'none' value of the CSS 'display' property can cause a subtree not to appear in the presentation.

This document will call these different types of behavior (and the default behavior of interpreting and presenting the content) modes. The framework must define a common vocabulary of modes that is sufficient to describe the behaviors of existing document format specifications. Document format specifications (or, for existing document formats, perhaps the framework specification) must define how their elements behave in each mode.

Behaviors to consider include:

normal presentation
HTML object
HTML noframes
HTML noscript
HTML noembed
SVG conditional processing
SVG ignoring of elements of unknown namespaces
behavior caused by SMIL

When separating modes, consideration needs to be given to both the effect of the mode on both the presentation and the semantics of content in a subtree with that mode. For example:

Is the content displayed?
Can the content affect CSS list numbering and/or counters?
Is script in the content executed?
Are style sheets in the content applied?
Can the content be reached by svg:use?
Can the content be reached by URI values in <paint>-valued SVG-CSS properties? URI values of the 'filter' SVG-CSS property? URI values of the 'cursor' SVG-CSS property? Are SVG fonts defined in the content usable by the 'font-family' property?
What affect does the mode have on the semantics of HTML heading structure for Hn elements in that mode?
What affect does the mode have on link relationships expressed in elements in the mode?
What affect does the mode have on RDF or other metadata in elements in the mode?

Furthermore, the framework must provide a mechanism for combining these modes, so that restrictions changing the mode from normal processing can be provided by both the parent element and the child element and these restrictions can be combined.

Furthermore, for each mode, the framework must define which restrictions of the mode, if any, can be removed in a subtree and which must remain if the subtree is referenced from a less restricted element (such as svg:use).

For each element defined by a document format specification, the specification must define, for every possible child element, under what circumstances that child element is in a restricted mode.

For each element defined by a document format specification, the specification must define, for every possible parent of the element (including no parent, i.e., being the root element of the document), under what circumstances the element is in a restricted mode.

Layout parameters

Specifications need to provide rules sufficient to determine the size and position of all elements, including those at namespace boundaries. Since different specifications use different models for layout computations, the information defined needs to be sufficient for every possible child of an element in another namespace to provide any necessary information for the parent's layout model, and likewise for any possible parent of an element in another namespace.

The framework specification needs to provide terms that specifications can use to describe these parameters. These definitions need to be sufficient to allow all languages to participate in CSS block/inline layout, to provide intrinsic widths for CSS table, float, and absolute positioning layout, and to participate in flexible box layout (yet to be specified by W3C, but to some degree in the charters for both the CSS and Web Applications working groups).

Examples of parameters that parent elements need to provide to child elements are width and/or height. Examples of parameters that child elements need to provide to parent elements are desired height given a width input (or vice versa, for a horizontal block progression), preferred and minimum intrinsic sizes (in one dimension or two, possibly with additional information on the relationship between the dimensions when given in two dimensions). In all of these cases, it needs to be clear whether the information passed between parent and child already includes information from things like the CSS 'width', 'min-width', and 'max-width' properties or whether that information needs to be considered by the recipient of the information.

Mixing of attributes

The framework may need to define issues related to handling of multiple linking attributes, for example, handling an element that is both an xlink and an HTML link.

The framework may need to define issues related to handling of multiple ID attributes, for example, the handling of an element that has more than one of an ID from a DTD, an xml:id attribute, and an ID from DOM Level 3 Core's setIdAttribute and setIdAttributeNS methods on the Element interface.

Disagreement between specifications

The framework should resolve as many known disagreements between specifications as possible. However, doing this without the consensus of the relevant working groups would be meaningless. The CDF working group should attempt to work with other relevant working groups to solve these problems. If needed, the CDF framework should specify the solutions, but in most cases they belong in other specifications. These disagreements include:

CSS Parsing: In what cases are the modified CSS parsing rules defined by SVG applied to a text/css style sheet?
Viewports: When mixing markup based on CSS with markup not based on CSS (e.g., SVG), where are the CSS viewports?
Containing blocks: When crossing into the CSS formatting model from something outside of it, where are the CSS containing blocks for in-flow and absolutely positioned content? (This is probably a bigger issue for MathML than for SVG. It may not be actual disagreement, though.)

Identification

Various MIME type registrations exist for document formats that allow multi-namespace documents. It is often unclear what these registrations imply about multi-namespace documents in terms of both document conformance and user-agent conformance. The framework should provide guidelines on how such registrations should be written to avoid these problems and work with other working groups to revise MIME type registrations. The following questions need to be clearly answered:

Does the application/foo+xml registration require that all elements in a document with that type be FooML? If it does, how should applications handle the case where some are not?
Does the application/foo+xml registration require that the root element in a document with that type be FooML? If it does, how should applications handle the case where it is not?
What MIME type should be used for documents that mix namespaces from multiple document format specifications?

Content negotiation

Since MIME types are used for both identification and content negotiation, the framework's statements on content negotiation also need to be coordinated with other working groups and their MIME type registrations. The following questions need to be answered:

Should it be possible for content negotiation regarding which document formats may be combined in a single document to occur independently of profiles? (We believe it should be.) If so, how is this done?
What does indicating acceptance of a document format's MIME type mean in terms of accepting it within compound documents? It clearly implies acceptance as the only document format. What about as the root format within a multi-format document? What about as a non-root format within a multi-format document?

See also previous position statement (more detailed) on these issues and the discussion about it.

Acknowledgments

Thanks to Ian Hickson, Robert O'Callahan, Tim Rowley, Jonas Sicking, and Boris Zbarsky for comments on this document.