LISTSERV mailing list manager LISTSERV 16.0

Help for XML-L Archives


XML-L Archives

XML-L Archives


XML-L@LISTSERV.HEANET.IE


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

XML-L Home

XML-L Home

XML-L  December 1998

XML-L December 1998

Subject:

Re: storing XML documents on relatioanl database, 2

From:

Mark Birbeck <[log in to unmask]>

Reply-To:

General discussion of Extensible Markup Language <[log in to unmask]>

Date:

Fri, 18 Dec 1998 15:50:21 -0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (167 lines)

Peter Flynn wrote:

> Mark Birkbeck writes:
>
> >  There are a number of simple ways of treating 'mixed content'. If
we
> >  have:
> >
> >           <TEXT>
> >                   <QUOTE>
> >                           Living in the
> >                           <COUNTRY ISO="US">States</COUNTRY>
> >                           must be great y'all
> >                   </QUOTE>
> >                   , said
>
> This is going to cause problems if you ever want to get the data
> back out again and use it for typesetting, because you'll get
>
>   " Living in the States must be a great y'all " , said ..."
>    ^                                          ^ ^
> Those intrusive spaces are what most browsers will do with the record
> ends.

I was laying it out for legibility - but if we want to be pedantic I
thought it would actually be passed as:

                            Living in the
                            States
                            must be great y'all
                     , said

since white-space inside elements that do not have type 'element
content' is meant to be preserved, is it not?

>           <TEXT>
>                   <QUOTE>
>                           <PCDATA>Living in the</PCDATA>
>                           <COUNTRY ISO="US">States</COUNTRY>
>                           <PCDATA>must be great y'all</PCDATA>
>                   </QUOTE>
>                   <PCDATA>, said</PCDATA>
>
> That's much better,

Thanks.

> except that you probebly don't even want the comma
> now, because it can be inferred from the rules of English grammar,
> which say that reported speech in quotes followed by the verb
> expressing who uttered it is delimited with one (hunt the referent
:-).

Mmm. I take your point, but feel a little uneasy about removing things
from the original text. I did it in my example with the quotes because I
thought that everyone might latch onto that instead of the wider point I
was trying to make. But now we have lost the original text. What of
novels such as Trainspotting by Irvin Welsh, or Patsy Clarke Ha! Ha!
Ha!, which make use of layout devices such as:

        - Eat your dinner, said mum.
        - No I don't want it, I said.
        - Eat it now shouted dad.

Now if we mark that up with a <QUOTE> tag, and then remove the comma in
the first line, as per your rule, then when we render it back again, we
get:

        - "Eat your dinner", said mum.
        - "No I don't want it", I said.
        - "Eat it now", shouted dad.

Putting aside the speech marks, which is less of an issue, a comma has
been introduced in the third line that the author never wrote!

>   The first solution feels more 'correct';
>
> ...apart from the record end problem, which is inherent to mixed
> content: for correct rendering it must be encoded as
>
>            <TEXT><QUOTE>Living in the <COUNTRY
>            ISO="US">States</COUNTRY> must be great
>            y'all</QUOTE>, said
>
> avoiding the intrusive line-ends.

As I said before, isn't all white-space passed through in mixed content?
What you are driving at is something slightly different which is that
most browsers fold a lot of white-space into one space, and so if you
have a line break in the middle of a sentence you end up with an extra
space character. You haven't solved the problem with your layout,
because you have got round this problem by using a line-break to act as
a space. Surely not on, because you have lost some of the original data!
To illustrate, if you had this:

        Let's point to my name ->Mark for want of something better.

But stored it as:

        <TEXT>Let's point to my name ->
        <NAME>Mark</NAME> for want of something better.
        </TEXT>

You would get:

        Let's point to my name -> Mark for want of something better.
                               ^
                               |
* extra space ------------------


(sorry if you're not using a fixed width font!). In other words you have
not solved the problem that you thought you had. As it happens I don't
think it's the job of XML to worry about browser problems. XSL could
deal with it though.

>   I haven't delved far enough
>   into the XML definition but it may even be 'implied' by the
definition,
>   since untagged data is PCDATA.
>
> I don't know where that idea comes from. All data must be enclosed in
> some element: there is no such thing as "untagged" data.

Fair comment. I was trying to say that if you had a mixed-content
element, at the level of implementation in our database it was
equivalent to:

        <QUOTE>
            <PCDATA>Living in the</PCDATA>
            <COUNTRY ISO="US">States</COUNTRY>
            <PCDATA>must be great y'all</PCDATA>
        </QUOTE>

and that the introduction of an implied child of type PCDATA may not be
so far from the XML definition.

>>   The second, however, is slightly easier
>>   to implement in a user interface, and given that's where most of
the
>>   problems lie, that's what we've done for now!
>
>This is the approach the EuroMath DTD takes: there is no mixed
>content. It's superficially attractive but more cumbersome to process,
>and makes for a more complex DTD, as the element which holds the
>undistinguished text usually has to occur at many levels.

I'm not talking about changing the DTDs. I leave those intact. I was
merely using XML-style notation to illustrate how we store data in the
database (or our object-reflection). Each object in our database has by
default a pre- and post-text attribute, that is part of our system, not
the DTD. It's a kludge, so I don't want it cluttering up other stuff!

Regards,

Mark


Mark Birbeck
Managing Director
Intra Extra Digital Ltd.
39 Whitfield Street
London
W1P 5RE
w: http://www.iedigital.net/
t: 0171 681 4135
e: [log in to unmask]

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

February 2018
February 2017
August 2016
June 2016
March 2016
January 2016
July 2014
April 2014
January 2014
July 2013
February 2013
September 2012
August 2012
October 2011
August 2011
June 2011
January 2011
November 2010
October 2010
July 2010
June 2010
March 2010
February 2010
January 2010
November 2009
September 2009
August 2009
July 2009
May 2009
March 2009
December 2008
October 2008
August 2008
May 2008
March 2008
February 2008
January 2008
December 2007
October 2007
August 2007
June 2007
March 2007
January 2007
December 2006
September 2006
July 2006
June 2006
April 2006
February 2006
January 2006
November 2005
September 2005
August 2005
July 2005
June 2005
May 2005
March 2005
January 2005
October 2004
August 2004
July 2004
June 2004
May 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
December 1997
November 1997
October 1997

ATOM RSS1 RSS2



LISTSERV.HEANET.IE

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager