There are two types of entities: general entities and parameter entities. Both are declared in their respective entity declaration, and they are instantiated by means of an entity reference. There are also character references, which take fixed values (viz. the character which they refer to) and thus need no declaration.
General entities can be referenced anywhere in the document, whereas parameter entities may only be refernced inside the DTD.
Entities are normally replaced at each occurrence of their reference, and the replaced text is then parsed again as part of the SGML document. By using specific entity data types, this behavior can be altered. Entities can be stored externally and referred to via an external (public and/or system) identifier.
General entity:
<!ENTITY name content >
<!ENTITY name PUBLIC "public-id" ["system-id"] >
<!ENTITY name data_type content >
<!ENTITY name brack_type content >
<!ENTITY name PUBLIC "public-id"
["system-id"] extent_type notation >
<!ENTITY name PUBLIC "public-id"
["system-id"] SUBDOC >
If no type is specified (first and second lines), the entity is fully parsed as SGML, i.e. it may contain elements and further entity references. Internal entities may be of the data type data_type; bracketed text entities replace tags and are of type brack_type; and external entities can be of a specific external entity datatype (extent_type), which requires a notation, or may be a subdocument.
Internal data types (data_type) |
Description |
---|---|
CDATA | Character Data — pasted and then completely ignored by the parser |
SDATA | Specific Data — system specific character data (???) |
PI | Processing Instruction — in reference concrete syntax something like <?content> |
Bracketed text (brack_type) |
Description |
STARTTAG | Pastes the content as an SGML start tag |
ENDTAG | Pastes the content as an SGML end tag |
MS |
Marked Section — pastes the content as an SGML Marked Section.
Example: <!ENTITY draft MS "TEMP [ temporary ">
|
MD | Meta Declaration — in reference concrete syntax something like <!content> |
External entity types (extent_type) |
Description |
NDATA | Non-SGML Data — needs a notation reference |
CDATA | Character Data — pasted and then completely ignored by the parser, needs a notation reference |
SDATA | Specific Data — system specific character data, needs a notation reference |
SUBDOC | SGML Sub-document (if feature enabled) |
SGML may contain marked sections:
<![ keyword [
...
]]>
keyword may be one of the following: (Note: all examples assume the reference concrete syntax.)
CDATA | Character Data. The parser goes into delimiter recognition mode until a close tag is found; that is to say, content in a CDATA section remains unparsed. |
RCDATA | Replacable Character Data. Same as CDATA, only that general entity references (&...;) and character references (&#...;) are parsed and replaced. |
IGNORE | The section is ignored completely. |
INCLUDE | The section is included normally. |
TEMP | Temporary, with whichever consequences. |
While CDATA and RCDATA are useful for outputting markup ‘source’, the last three types of section allow conditional markup by means of parameter entities: <![ %status; [ ... ]]>, and <!ENTITY % status 'IGNORE'> etc.
Element types are declared by specifying the element type name and the element’s declared content in the following way:
<!ELEMENT type_name - - declared_content>
Declared content can be one of the following five:
EMPTY | Element must not contain anything |
ANY | Element can contain any declared elements or PCDATA |
CDATA | Character Data — see the CDATA Marked Section above |
RCDATA | Replacable Character Data — see the RCDATA Marked Section above |
content-model | A parantheses delimited
list of elements or nested content models; may include the
#PCDATA primitive content token
|
Each element can have any number of attributes, and
the permissable attributes for each element type are declared in
an ATTLIST
-declaration:
<!ATTLIST element_type_name
attrib_1_name value_type status_keyword
...
attrib_n_name value_type status_keyword >
(Note that SGML, unlike XML,
allows for element_type_name to be a list of element
type names as in (etn_1|etn_2|...|etn_x)
.) The
value type specifies which characters are permissible
as the attribute’s value’s data, and it must be one
of the following:
list | A bracketed list of possible values, as in ("val1"|"val2"|...) |
CDATA | Character Data. Any valid SGML characters; however, general and character entities are expanded. |
ID | A unique identifier for the element. |
NOTATION | This value type name must be followed by a bracketed list of (elsewhere) declared notations, as in NOTATION (not1|not2), and the permissible attribute values are precisely the notation identifiers from that list. |
IDREF | A reference to an ID, i.e. a name that is the unique identifier of some other element. |
IDREFS | Space-separated list of IDREFs. |
ENTITY | A currently declared (data or subdocument) entity name. |
ENTITIES | Space-separated list of ENTITY-s. |
NAME | A valid SGML name. (Must start with an alphabetic character but may contain numbers.) |
NAMES | Space-separated list of NAMEs. |
NMTOKEN | A Name Token, which may only contain valid SGML name characters but may start with either alphabetic or numeral characters. |
NMTOKENS | Space-separated list of NMTOKENs. |
NUMBER | A number. |
NUMBERS | Space-separated list of NUMBERs. |
NUTOKEN | A Number Token, which may only contain valid SGML name characters but must start with a number. Useful for dimensional quantities like 5px. |
NUTOKENS | Space-separated list of NUTOKENs. |
The characters that are permissible for an SGML name are declared in the SGML declaration. Note that NAME and NUMBER, and also NUTOKEN, are rather specific, whereas NMTOKEN is a more general data type.
The status keyword that follows the attribute value type in the attribute declaration indicates whether the attribute is optional, required or has a default value. It can be one of the following:
value | The default value for that attribute if no other value is specified. This must be a permissible string of characters. It cannot be specified for ID and NOTATION type attributes. It can be empty ("") only for CDATA type attributes. |
#FIXED | This reserved name is followed by a value, and the attribute always has this value. If the attribute is specified in the document, it must take this value. |
#REQUIRED | The attribute value must be specified in the document. |
#IMPLIED | The attribute is optional and can be omitted. |
#CURRENT | The attribute is required in the first element; in every subsequent element it defaults to the last specified value. |
#CONREF |
Content Reference. Either the attribute is a reference to
some other element (e.g. via as an IDREF), or the element contains
some sort of cross reference. As an example, consider the declaration
<!ATTLIST figure id ID #REQUIRED >
for which the following two instances of a figref element
should be permissable: <figref refid="fig1"> or
<figref>(see Figure 1)</figref> .
|
When using external data, it is necessary to provide information about the type of data. This is done by means of a notation, which is declared in a notation declaration and used either in attributes or in data entities. The general usage of notations has been described above; here are some details.
The notation declaration looks like this:
<!NOTATION name SYSTEM system-id >
<!NOTATION name PUBLIC public-id [system-id] >
Once declared, a notation may now also receive attributes, which can be set when the notation is used in a data entity. Attribute values may, for obvious reasons, not be of type ENTITY, ENTITIES, ID, IDREF, IDREFS or NOTATION, and they may not use the occurrence keywords #CURRENT or #CONREF. The attributes for a notation are declared like this:
<!ATTLIST #NOTATION notation_name
attrib_name value_type status_keyword >
Again, notation_name may actually be a bracketed list of notation names, and of course there may be multiple lines of attributes. Here is a usage example:
<!NOTATION oggfile SYSTEM "audioplayer.exe" >
<!NOTATION mp3file SYSTEM "audioplayer.exe" >
<!ATTLIST #NOTATION (oggfile|mp3file)
bitrate NUTOKEN #REQUIRED
title NAME "Audio File" >
<ENTITY opcred SYSTEM "opening_credits.ogg" NDATA oggfile [bitrate="192kbps" title="Opening Credits"] >
<ENTITY clcred SYSTEM "closing_credits.mp3" NDATA mp3file [bitrate="192kbps" title="Closing Credits"] >
...
&opcred; ... &clcred;