Notes on Mozilla DOCTYPE Sniffing and Parser and Layout Modes

This page is out of date. It was only correct during part of the summer of 2000. Mozilla no longer uses separate parser DTDs for loading web pages, although the strict DTD (which scares me) code is still around and used by the editor.

Substantive comments on this document should probably be sent to mozilla-layout, in the thread called "parser and layout modes" (July 2000).

Introduction

Because existing content on the web is not standards-compliant or would appear in unintended ways on a standards-compliant browser, Mozilla handles some content in a backwards compatible way and some content according to standard. Two separate modes are set: the parser DTD mode and the layout quirks mode.

The Parser DTD

The parser has modules called DTDs that are designed to represent something similar to an SGML DTD. The DTD determines when to open and close containers (in other words, the shape of the content model). The three DTDs in the parser are the Nav DTD, the Strict DTD, and the Transitional DTD. The Nav DTD tries to build the content model to be backwards-compatible with Navigator 4.x. The Strict DTD, which was enabled for the first time in M17, throws away any tags, attributes, or content not allowed according to the HTML 4.0 Strict DTD. Such a DTD will encourage authors to write documents conforming to the HTML 4 specification. The Transitional DTD is similar to the strict DTD except it follows the rules of the HTML 4.0 Transitional DTD.

The Layout Quirks Mode

There are also two modes used by layout (including the style system): quirks mode and strict mode. In Quirks mode, layout emulates nonstandard behavior in Navigator 4 and IE required not to break existing content on the Web. In Strict mode the behavior is (hopefully) the behavior described by the HTML and CSS specifications.

How we decide which to use

[Note: This discussion concerns only content sent as "text/html". Content sent as "text/xml" is handled by the XML parser and the strict mode in Layout.]

Right now we decide which modes to use based on the presence of and content of the DOCTYPE declaration. Right now, the Strict DTD always corresponds to the strict layout mode, and the Nav DTD always corresponds to quirks layout mode. The rules used to make this decision are changing weekly and based mainly on searching for strings within the DOCTYPE. However, a few rules are generally followed:

Quirks mode / Nav DTD: A document without a DOCTYPE declaration
Strict mode / Transitional DTD: A document with an HTML 4.0 transitional DTD that includes a system ID
Strict mode / Strict DTD: A document with an XML declaration, XHTML doctype, or HTML 4.0 strict doctype

Open Issues

How should Layout quirks be connected to the Parser DTD? Are there cases where we want to use the Nav DTD but have layout in Strict mode?

The parser DTDs are really only appropriate for certain pages, since they shouldn't be applied to pages using other DTDs. However, the idea of the layout modes should be to allow as many pages as possible (without breaking backwards compatibility) to trigger strict mode. Authors aware off Mozilla should also be able to choose modes without affecting the validity of their documents.

So, should there be ways of triggering strict layout mode without changing the DTD? For example, one could use an HTTP header, the presence of a system identifier in the doctype, or the presence of an internal subset in the doctype.

[I think we should be able to trigger strict layout mode using any of the things mentioned in the previous paragraph. I'm not so sure about parser mode.]

How should we handle unknown FPIs?

How should unknown FPIs be treated when determining parser mode and layout mode? Is it appropriate to use the Strict DTD for unknown FPIs? Should we use quirks layout mode for unknown FPIs?

[I think we should maintain a list of quirky DOCTYPES and put all new ones into strict layout mode. In other words, any content we can recognize as not requiring quirks for backward compatibility should trigger strict layout mode. I'm not sure about parser mode, since there are serious potential forward-compatibility problems.]

How strictly should we parse DOCTYPE and recognize declarations?

Does the parsing of the DOCTYPE declarations need to recognize only correct doctypes, or should it be looser and accept attempts to trigger Mozilla's stricter modes?

[I think we should parse strictly, since we don't want to create yet another set of invalid documents that people expect to be recognized.]