At 04:20 PM 5/16/01, you wrote:
>I'm trying to write an XML DTD against which to validate my XML documents.
>Can anyone recommend a source of information for differences in DTD syntax
>between XML and SGML?
Not as such, I'm afraid, but a good guide to what actually is allowed in
XML DTD syntax should be some help. I like Neil Bradley's *XML Companion*,
partly because he's experienced with SGML, and that helps inform his coverage.
>Both my XML documents and the DTD existed as valid
>SGML, and I'm tring to view the docs in IE5.5, validating with the MSXML
>parser. I've installed MSXML 3.0 in "replace" mode.
So far so good.
>I can view the docs OK in IE as long I don't reference the DTD (so they're
>just well-formed), but so far I can't get them to validate. One thing in
>particular that has me stumped is the error message I'm getting:
> A name was started with an invalid
> character. Line 1, Position 5
>First, this isn't line 1 in the DTD. The full line is
><!ENTITY % para "ptxt | ttxt | wtxt | ntxt | eq">
>(so it's not position 5, either)
>The parser seems to be objecting to the OR separator, but earlier in the
>DTD is a similar content model:
><!ENTITY % phrase "term | quote | option | variable | emph | pubref |
>keystr | sub | sup">
>Why doesn't this one throw up an error? What am I missing?
Off the top of my head, I'd list three major differences between XML and
1. XML supports only a subset of SGML attribute types (e.g. no NUMBER or NAME)
2. XML may not have omissibility indicators in ELEMENT declarations. E.g.:
<!ELEMENT ptxt - - (#PCDATA) >
<!ELEMENT ptxt (#PCDATA) >
The SGML "- -" indicates that both start and end tags are required: this is
redundant in XML, where they are always required. You may also see "- 0"
(meaning no end tag is required) or even another permutation in SGML DTDs:
likewise not allowed in XML.
3. This might be your error. In SGML, ELEMENT declarations for several
elements with identical content models may be combined. So:
<!ELEMENT ptxt|ttxt|wtxt|ntxt|eq - - (#PCDATA) >
<!ELEMENT ptxt (#PCDATA) >
<!ELEMENT ttxt (#PCDATA) >
<!ELEMENT wtxt (#PCDATA) >
<!ELEMENT ntxt (#PCDATA) >
<!ELEMENT eq (#PCDATA) >
This could be what is accounting for your error: no "|" allowed in the
element name, and somewhere you might have a declaration like:
<!ELEMENT %para; - - (#PCDATA) >
(Why the parser is otherwise misreporting the error I can't say.) An SGML
parser would know better (the pipe would delimit two element names); in XML
it's simply not allowed. (This restriction is presumably so that parser
code can be really lightweight.)
Oh, and there are a couple of other differences. XML has no inclusions or
exclusions to content models; and XML has tight restrictions on content
models containing #PCDATA (to disallow Pernicious Mixed Content, an old
SGML bugaboo). If you run into something that looks like one of these,
consult a good SGML book next to your XML book (Bradley's SGML Companion is
also good); or ask again.
Do any of the SGML experts on this list see anything I've missed?
>Alternatively, does anyone know of another *standalone* XML parser I could
>use to check my DTD/document? I'm not a developer, so please don't point me
>to a collection of Java class libraries or such. I need an executable.
I like RXP: an executable that runs on Windows. See