LISTSERV mailing list manager LISTSERV 15.5

Help for XML-L Archives


XML-L Archives

XML-L Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave XML-L
Reply | Post New Message
Search Archives


Subject: Re: storing XML documents on relatioanl database, 2
From: Peter Flynn <[log in to unmask]>
Reply-To:General discussion of Extensible Markup Language <[log in to unmask]>
Date:Fri, 18 Dec 1998 14:22:15 +0000
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (90 lines)


Mark Birkbeck writes:

   There are a number of simple ways of treating 'mixed content'. If we
   have:

           <TEXT>
                   <QUOTE>
                           Living in the
                           <COUNTRY ISO="US">States</COUNTRY>
                           must be great y'all
                   </QUOTE>
                   , said

This is going to cause problems if you ever want to get the data
back out again and use it for typesetting, because you'll get

   " Living in the States must be a great y'all " , said ..."
    ^                                          ^ ^
Those intrusive spaces are what most browsers will do with the record
ends.

           <TEXT>
                   <QUOTE>
                           <PCDATA>Living in the</PCDATA>
                           <COUNTRY ISO="US">States</COUNTRY>
                           <PCDATA>must be great y'all</PCDATA>
                   </QUOTE>
                   <PCDATA>, said</PCDATA>

That's much better, except that you probebly don't even want the comma
now, because it can be inferred from the rules of English grammar,
which say that reported speech in quotes followed by the verb
expressing who uttered it is delimited with one (hunt the referent :-).

           <TEXT>
                   <QUOTE>
                           <COUNTRY ISO="US" PRE="Living in the" POST="must
   be great y'all">States</COUNTRY>
                   </QUOTE>

Gag, Mark. But it resolves to the same stream.

   The first solution feels more 'correct';

...apart from the record end problem, which is inherent to mixed
content: for correct rendering it must be encoded as

            <TEXT><QUOTE>Living in the <COUNTRY
            ISO="US">States</COUNTRY> must be great
            y'all</QUOTE>, said

avoiding the intrusive line-ends.

   I haven't delved far enough
   into the XML definition but it may even be 'implied' by the definition,
   since untagged data is PCDATA.

I don't know where that idea comes from. All data must be enclosed in
some element: there is no such thing as "untagged" data.

   The second, however, is slightly easier
   to implement in a user interface, and given that's where most of the
   problems lie, that's what we've done for now!

This is the approach the EuroMath DTD takes: there is no mixed
content. It's superficially attractive but more cumbersome to process,
and makes for a more complex DTD, as the element which holds the
undistinguished text usually has to occur at many levels.

[Sam Hunting]
   > How do you do a join on XML data that looks like this:

   > <element>This is #PCDATA<mixed>with mixed content</mixed>and an
   element
   > mixed.</element>

The word "join" is out of context: it belongs in the vocabulary of
database engineering, and XML is about text markup. In any event, the
markup above is wrong and dangerously misleading, since it parses to

   This is #PCDATAwith mixed contentand an element mixed.

which I am sure is not what the author intended. It ought to read

   <element>This is #PCDATA <mixed>with mixed content</mixed> and an
   element mixed.</element>

(see the difference?).

///Peter

Back to: Top of Message | Previous Page | Main XML-L Page

Permalink



LISTSERV.HEANET.IE

CataList Email List Search Powered by the LISTSERV Email List Manager