Invoking the SGML declaration

Selected Content

Introduction

The SGML declaration is the mandatory first part of every SGML document. In the special case of XML, the SGML declaration is fixed and must not be specified, but it exists nonetheless, and conforming SGML parsers must start by reading and evaluating an SGML declaration for every validation.

The SGML declaration determines the lexical rules for the SGML document, that is it specifies which characters are permissible, which characters are control characters, which options of SGML are used and which are forbidden, etc.

The SGML declaration can only be written in one particular syntax (the Reference Concrete Syntax. The following Document Type Definition (DTD) may be written either in Reference Concrete Syntax (if the declaration specifies SCOPE INSTANCE) or in the syntax which is declared in the SGML declaration (if it specifies SCOPE DOCUMENT). Therefore the parser must know the lexical rules, i.e. the SGML declaration, before it attempts to parse the document

The SGML declaration is hardly ever of real concern, and it is usually included implicitly. This document describes how this is done in detail.

Manual declaration inside the document

Omitting the declaration

If no SGML declaration is provided at all, then the parser must, according to the standard, assume the default SGML declaration (which is part of the standard). This is normally very impractical because the default declaration imposes some very limiting constraints.

Calling the parser

The first proper way to include the SGML declaration in a document (apart from falling back on the default) is to simply not put anything inside the actual document and call the parser with two separate filenames, first the declaration and then the document. The parser will then just read the two files sequentially.

Explicit reference in the document

The second way is to link a specific SGML declaration permanently to the SGML document (as opposed to specifying it at each parser call). This again can be done in a number of different ways.

To start with, one can include the entire SGML declaration:

<!SGML "ISO 8879:1986 (WWW)" -- SGML declaration here... -- >
<!DOCTYPE doctype [ -- DTD here... -- ]>
<doctype> ... document instance ... </doctype>

Alternatively, the actual declaration (starting with "ISO 8879:1986 (WWW)") may reside in a separate file. In that case, the SGML declaration gets a name (like any other meta-declaration, though it does not really matter here) and the filename is passed on as a system identifier:

<!SGML name SYSTEM "filename" >
<!DOCTYPE doctype [ -- DTD here... -- ]>
<doctype> ... document instance ... </doctype>

At last, publicly identified SGML declarations can be referred to via their public identifier as usual, and a catalog entry will have to point to the actual file. An explicit filename may be specified optionally, again as usual:

<!SGML name PUBLIC "public_identifier" "system_identifier" >
<!DOCTYPE doctype [ -- DTD here... -- ]>
<doctype> ... document instance ... </doctype>

Using catalogs

When working with publicly standardized document types, most meta declarations make reference to a public identifier only. These identifiers are then resolved into system identifiers (file names) by an SGML catalog file. Catalog files can also be used to specify the SGML declaration. There are two ways of doing this:

To specify an SGML declaration that gets used when no other declaration is present, use the following line in the catalog:

SGMLDECL "filename"

This will in effect replace the standard SGML declaration for all documents that provide no other declaration.

Secondly, one can specifically link one SGML declaration to each document type public identifier with the following catalog entry:

DTDDECL "DTD_public_identifier" "filename"

This way, every document that provides no SGML declaration of its own and starts with <!DOCTYPE doctype "DTD_public_identifier" [...] > will have the SGML declaration from filename prepended.

Attention: Files that contain SGML declarations for inclusion via catalogs like described in this section must contain the declaration inside the <!SGML ... > markup! (The same holds for files that are called alongside the document in the parser call, as described above.) That is to say that in total there must be precisely one tag <!SGML ... > in the complete SGML document.

Some standards

The first characters of the SGML declaration proper (after the <!SGML markup) must be the minimum literal, whose minimum data must be one of the following:

"ISO 8879:1986"
"ISO 8879:1986 (ENR)"
"ISO 8879:1986 (WWW)"

The first line specifies that the document follows the original SGML standard, the middle line refers to Annex J of the standard, and the last line refers to Annex K of the standard, the Web SGML Adaptations. Annex K supercedes all other standards and should be used exclusively. It allows finer tuning of the syntax, as for instance required for XML’s compact form of empty elements (<element/>).

Example public identifiers

<!SGML HTML3.2 PUBLIC "+//IDN W3C.ORG//SD HTML Version 3.2//EN">