The standard requires, as I outlined in my article on XML and the web, that XHTML be served as
application/xhtml+xml
, or at least generically as
application/xml
.
Current user agents (browsers) usually handle documents that are written as XHTML reasonably well. However, current browsers also contain XML parsers, and their behaviour when confronted with an XML or XHTML document depends drastically on the MIME type as which the document is served:
If the MIME type is text/html
, the browsers will just try
and interpret the document as ordinary HTML ‘tag soup’,
which might appear to work rather well on well-formed XHTML, as long
as it only contains HTML ‘tags’. However, no XML parsing is
done in this case, and so no entities that may have been declared in
internal or external DTD subsets will be available. Moreover, marked
sections (most likely the CDATA
sections required for the
script
and style
elements) will not be
parsed and the literal string <![CDATA[
etc will
be considered part of the element content. This will most likely
break scripts and stylesheets.
The Document Object Model (DOM) of XML is somewhat more sophisticated and comprehensive than the (usually proprietary) quasi-DOMs with which browsers allow HTML tag soup to be manipulated. Therefore, by serving the document as the wrong MIME type, client-side scripting may also be affected.
Silly reminder:
When handling the DOM
in proper XHTML+XML, remember to use namespace aware methods
like createElementNS
. Just remember
namespaces, period. Browsers are only required to (and should only) display
XML as HTML if the root element is {http://www.w3.org/1999/xhtml}html
and all subordinate elements belong to http://www.w3.org/1999/xhtml
.
A possible source of confusion due to the above mentioned
issues is that the MIME type which a browser assumes depends
on the circumstances under which the document is loaded:
If the document comes from the local machine, the browser
might simply go by the file name. In that case, file extensions
like .html
or .htm
will usually
be interpreted as text/html
, whereas .xml
may be either text/xml
or application/xml
,
and .xhtml
may in fortunate cases be read as
application/xhtml+xml
.
This situation is probably not ideal for any author’s sanity. Therefore it would seem best to edit, or at least check frequently, the documents as served through a web server; and that web server should be unambiguously configured.
In the case of the celebrated Apache webserver, it would be
sensible for example to place in the document directory
an .htaccess
rule such as this:
AddType 'application/xhtml+xml; charset=iso-8859-1' xhtml
Then one should name all XHTML pages with the extension .xhtml
(which seems rather sensible anyway) and can be sure that modern browsers
will do the Right Thing.
A perpetual source or sorrow, despair, hatred and violence,
Microsoft’s browser does actually not cope with
application/xhtml+xml
MIME types. This is
in spite of the fact that it has a quite powerful XML parser,
and it is a shame that such silly little file type recognition
issues stand in the way of using an otherwise rather
powerful tool.
Fix it:
However, by adding MIME information about
application/xhtml+xml
into the Windows
registry, the Internet Explorer can be made to accept these
files! See below on how to do it.
The Internet Explorer supports some kind of XSLT, though not
fully standards compliant. (You look surprised?) If I recall
correctly, the Internet Explorer also gets confused by CSS
stylesheets for general XML documents that are invoked with
the <?xml-stylesheet?>
processing instruction.
Most annoyingly, IE will actually fail to parse correct XHTML 1.1. This is because one the one hand it does parse the entire DTD (which is a very strong feature), but on the other hand its implementation is buggy (you still look surprised?) with respect to the treatment of relative pathnames, wherefore it complains about not finding a module file.
As a last surprise, the Internet Explorer retains its ability to produce intentional misbehaviour (‘Quirks Mode’) even when processing XML/XHTML (just in case an author wanted to switch to the new technology but needed to rely on a buggy browser). However, unlike in the HTML case, where the buggy browser behaviour is only triggered by buggy documents (so writing proper HTML results in proper processing), in XHTML the quirks mode is caused by proper XHTML, namely by specifying the XML declaration.
(See the Mozilla FAQ.)
When invoking its XML parser, Mozilla based browsers are rather powerful. However, there is a weird peculiarity when it comes to parsing the DTD: Mozilla does read the internal subset, and entities which are declared therein may be used (and will be expanded) in the document. The standard ISO entities from XHTML may also be used. However, any external subset (referred to via a system ID) is not parsed by Mozilla. (I am not sure whether this can be helped by placing the external DTD file in a specific subdirectory of the Mozilla installation. Input welcome.)
Mozilla supports and automatically applies XSLT transformations
that are specified with the <?xml-stylesheet?>
rule. It does not support XSL-FO at the moment. It does support
CSS as a general styling language for XML.
In XML parsing mode Opera seems to cope well with both internal
and external DTDs (note that neither browser validates,
so I am referring only to the usage of entities), and it parses
CDATA
sections correctly. Opera does not apply XSLT
transformations, nor does it support XSL-FO; it does however
support CSS for general styling of XML.
Fix it:
Unfortunately, Opera’s HTTP_ACCEPT
does
not allow server content negotiation in favour of XHTML
(see below). Fortunately, this can be
fixed easily on the client side.
When Apache content negotiation is enabled and Apache is set to
serve .xhtml
files as application/xhtml+xml
as described above, then it is possible
to place two files, e.g. website.html
and
website.xhtml
into the same directory and
refer to them merely as website
, and Apache will
send the right file depending on which MIME types the browser
claims to support in its HTTP_ACCEPT
HTTP header
field. The two filenames could for instance be links to the same
basic file, which could then be authored to conform to both
standards.
(To make an HTML document work both in tag soup and XML
parsers, it should be encoded in UTF-8, not have an
XML declaration, have a meta
element setting
the content type to “text/html; charset=UTF-8
”,
and empty elements should have a space just before the
“/>
”. Moreover, marked sections
have to be avoided, which is easily done by putting
scripts and stylesheets in separate files.)
The browser situation is as follows:
application/xhtml+xml
in the HTTP_ACCEPT
header field at all; it will receive
the .html
file.
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9
…”
and will receive the .xhtml
file.
text/html, application/xml;q=0.9, application/xhtml+xml,
…”,
which is annoying because it does accept the right type, but
Apache will still send the .html
file, because
text/html
is of the same priority (1.0) as
application/xhtml+xml
. Fortunately,
Opera can be reconfigured to send a more appropriate
HTTP_ACCEPT
.
When a server-side scriping language is available, it
is conceivable to test whether the HTTP_ACCEPT
field contains any mention of application/xhtml+xml
at all (or even application/xml
as a fallback) and
then tailor the HTTP header as suitable.
For example in PHP one could start a document
website.php
like this:
<?php
if (stristr($_SERVER["HTTP_ACCEPT"], "application/xhtml+xml")) {
header("Content-Type: application/xhtml+xml; charset=iso-8859-1");
echo "<?xml version=\"1.0\" encoding=\"iso-8859-1\" standalone=\"no\"?>\n";
} else
header("Content-type: text/html; charser=iso-8859-1");
?>
The HTML head could then contain the element
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
to make the document valid HTML; and this could also be suppressed conditionally on whether the file will be served as XHTML, but would not need to be, since XHTML requires that content-type meta elements be ignored.
To register the MIME type application/xhtml+xml
under Windows (so that the Internet Explorer will open
files of that type), the following information needs to
be added to the registry:
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT\MIME\Database\Content Type\application/xhtml+xml]
"CLSID"="{25336920-03F9-11cf-8FD0-00AA00686F13}"
"Encoding"=hex:08,00,00,00
"Extension"=".xhtml"
The above code is in .reg
file format, and it is also available
as a registry patch file.
Opera accepts text/html
and application/xhtml+xml
at the same (maximum) priority, which may give unpredictable
results in conjunction with server-side content negotiation.
To change Opera’s HTTP_ACCEPT
string,
one of its .ini
files must be changed. The location
of the relevant file can be found by going to opera:about,
it is the file listed after “Preferences” in the section
called “Paths”.
The HTTP_ACCEPT
string is set under the section
[Adv User Prefs]
as the value of HTTP
Accept, and it should be set to something like this:
[Adv User Prefs]
HTTP Accept=application/xhtml+xml, application/xml;q=0.9,
text/html;q=0.85, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
...