Mark Birkbeck writes:

   There are a number of simple ways of treating 'mixed content'. If we

                           Living in the
                           <COUNTRY ISO="US">States</COUNTRY>
                           must be great y'all
                   , said

This is going to cause problems if you ever want to get the data
back out again and use it for typesetting, because you'll get

   " Living in the States must be a great y'all " , said ..."
    ^                                          ^ ^
Those intrusive spaces are what most browsers will do with the record

                           <PCDATA>Living in the</PCDATA>
                           <COUNTRY ISO="US">States</COUNTRY>
                           <PCDATA>must be great y'all</PCDATA>
                   <PCDATA>, said</PCDATA>

That's much better, except that you probebly don't even want the comma
now, because it can be inferred from the rules of English grammar,
which say that reported speech in quotes followed by the verb
expressing who uttered it is delimited with one (hunt the referent :-).

                           <COUNTRY ISO="US" PRE="Living in the" POST="must
   be great y'all">States</COUNTRY>

Gag, Mark. But it resolves to the same stream.

   The first solution feels more 'correct';

...apart from the record end problem, which is inherent to mixed
content: for correct rendering it must be encoded as

            <TEXT><QUOTE>Living in the <COUNTRY
            ISO="US">States</COUNTRY> must be great
            y'all</QUOTE>, said

avoiding the intrusive line-ends.

   I haven't delved far enough
   into the XML definition but it may even be 'implied' by the definition,
   since untagged data is PCDATA.

I don't know where that idea comes from. All data must be enclosed in
some element: there is no such thing as "untagged" data.

   The second, however, is slightly easier
   to implement in a user interface, and given that's where most of the
   problems lie, that's what we've done for now!

This is the approach the EuroMath DTD takes: there is no mixed
content. It's superficially attractive but more cumbersome to process,
and makes for a more complex DTD, as the element which holds the
undistinguished text usually has to occur at many levels.

[Sam Hunting]
   > How do you do a join on XML data that looks like this:

   > <element>This is #PCDATA<mixed>with mixed content</mixed>and an
   > mixed.</element>

The word "join" is out of context: it belongs in the vocabulary of
database engineering, and XML is about text markup. In any event, the
markup above is wrong and dangerously misleading, since it parses to

   This is #PCDATAwith mixed contentand an element mixed.

which I am sure is not what the author intended. It ought to read

   <element>This is #PCDATA <mixed>with mixed content</mixed> and an
   element mixed.</element>

(see the difference?).