Madeleine Wright wrote:
> Thanks. In the October 2000 version of the XML Spec, the DeclSep appears as
> production 28a and is used in the doctypedecl production as an alternative to
> markupdecl. It is defined as:
> PEReference | S
> In the Errata document, as of 13/06/2001, however, the doctypedecl production
> has been amended to specify only intSubset in that 'slot' and the new intSubset
> production now reads:
> (markupdecl | DeclSep)*
Got it. Note that the net effect of the latter errata change is simply
readability. It appears that DeclSep was originally added to clarify the
use of PE references in the internal subset, something that was only
handled in text (not EBNF) before. Note that PE references in external
subsets is still not covered in the EBNF -- given that that the can
appear anywhere (? -- I can't remember the details), this would make the
EBNF almost unreadable.
> What puzzles me is that, while the external subset is referenced by the
> External ID, the External ID is just that, a reference, and not a way, for
> instance, for a parser to move logically to handle the external subset. I can
> understand that, because the external subset is not actually part of the
> Document, it would make sense not to lead to it as, for example, an alternative
> to the internal subset. But how is one meant to handle in parsing the magical
> leap from the External ID to the external subset?
What I did in my DTD parser (and I think other people did in theirs, as
I think I copied the idea from Aelfred) is as follows:
1) Constantly scan for PE references. Note that you have to do this
anyway for internal PE references.
2) When you hit an external PE reference, push the current
buffer/Reader/etc. onto a stack and continue parsing with the new
buffer/Reader/etc., which represents the external subset.
3) When you hit the end of the external subset, pop the old
buffer/Reader/etc. off the stack.
The general way to do this is to separate the parsing code from the
buffer reading code by means of a function to get the next character.
For an example of how to do this, see the class SubsetToDTD in . Note
that there might be bugs in this, as I've gotten a few untrackable bug
reports about it. You could also look at the code in Xerces, which is
As a final consolation, entity processing is by far the hardest part of
parsing an XML document.
Programming, Writing, and Training
XML, Databases, and Schemas