At 02:03 PM 07/05/1999 +0100, chris cole wrote:
>I have a valid XML document - if I add an undefined element (say "<sub>")
>within a #PCDATA field, then the parser objects, and the only way i can
>"<sub>2</sub>" is by inserting a CDATA section,
>eg: "<!CDATA[<sub>2</sub>]]>", or by substituting "<" with "<",
>eg:"<sub>2</sub>" - OK so far (I think).
Or not OK, depending on what you're trying to achieve. :)
When you put the "element" into a CDATA section, or use character entities
like < and > in place of the < and >, you're explicitly telling the
parser that you do not want it to read the enclosed text as an element name
(or anything else, for that matter). So although the parser will accept
that coding, your "<sub>...</sub>" element will not be an element at all --
just a string of text that happens to contain angle brackets. Think of the
term #PCDATA as used in a DTD; it means "Parsed Character Data." Drop the P
(as in a CDATA marked section) and you're left with plain-old character
data which is not parsed for markup.
The point of a CDATA section in element content is simply to instruct the
parser to ignore anything that looks like markup and pass it downstream
unaltered. This is useful for special applications like math/logical
statements (which might include many > and < characters), or HTML/XML
tutorials (which need to display code fragments, but don't want the parser
to read and interpret them).
>If the data appears in a CDATA field, eg:
> "<!ELEMENT AttrURL EMPTY>
> <!ATTLIST AttrURL Value CDATA #REQUIRED>"
> <AttrURL Value="zzzzzzz<sub>2</sub>">
Again, whether or not your parser accepts this, you're not really "adding
an undefined element." Putting markup into an attribute's value doesn't
make that element appear in a document tree; it just makes the whole thing
-- the markup and what looks like element content -- the value of the
If your goal is indeed to add a new element, one that's not accounted for
in the DTD, your only recourse is either (a) add the element where you want
it, and stop using a validating parser, or (b) add the new element's
declaration to the DTD, and include it in at least one of the existing
elements' content models.
>Could somebody please clarify what the correct handling should be (and the
>reasoning therefore) - i thought I understood all this Character Data & CDATA
>Section handling, but now I'm confused!!
Understandably. The similarity between the terms PCDATA and CDATA, and the
fact that there are *two* CDATAs (one in the DTD, in ATTLIST declarations;
one in documents, in marked sections), can reasonably confuse anybody!
Let us know if this helps, or not.
John E. Simpson | The secret of eternal youth
[log in to unmask] | is arrested development.
http://www.flixml.org | -- Alice Roosevelt Longworth