Print

Print


        The biggest problem with the Oracle white paper is that it assumes
that another quirky bolt-on is needed, such as their previous TREE
construct, instead of showing how basic SQL can do the job in a more general
and flexible fashion. "Tree Processing in SQL", by David Rozenshtein shows
how to handle queries against tree and graph structures stored in relational
databases, the key is the choice of matching table schemas and algorithms.
        The Oracle approach shows a tendency to think of handling XML as an
extension of structured text processing, instead of the fundamental
abstraction of a tree of nodes.
Cheers,
David vun Kannon

> -----Original Message-----
> From: Mark Birbeck [SMTP:[log in to unmask]]
> Sent: Thursday, December 17, 1998 9:02 AM
> To:   [log in to unmask]
> Subject:      Re: storing XML documents on relatioanl database
>
> Marcus Carr wrote:
>
> > There's an interesting white paper explaining Oracle's upcoming
> support for
> > XML - it covers just these issues, from their perspective, anyway.
> It's
> > located at http://www.oracle.com/xml/documents/xml_twp/
>
> Looks like an interesting first step, but IMHO it raises as many
> problems as it solves. For example, in their example of an insurance
> report the paper suggests:
>
>         <DamageReport>
>                 A massive <Cause>Fire</Cause> ravaged the building and
>                 <Casualties>12</Casualties> people were killed. Early
>                 FBI reports indicate that <Motive>arson</Motive> is
>                 suspected.
>         </DamageReport>
>
> We have tended to avoid this approach. With our stuff, we use the
> following type of syntax:
>
>         <TEXT>... today the <COUNTRY NAME="US">US</COUNTRY> announced
> ...</TEXT>
>         <TEXT>I would like to visit <COUNTRY NAME="US">North
> America</COUNTRY> for a holiday ...</TEXT>
>         <TEXT>Leary travelled across <COUNTRY NAME="US">the
> USA</COUNTRY> sampling ...</TEXT>
>         <TEXT>Iraqi newspapers today said that the <COUNTRY
> NAME="US">evil Empire</COUNTRY> ...</TEXT>
>
> which means that even if 'US' is not used in a document, you can still
> find all documents that 'mention' the USA. It seems to me that you
> either have to code up the information on the object, e.g. (back to
> Oracle's examples):
>
>         <DamageReport Cause="Fire" Casualties="12" Motive="Arson">
>                 <Summary>
>                         A massive Fire ravaged the building and
>                         12 people were killed. Early
>                         FBI reports indicate that arson is
>                         suspected.
>                 </Summary>
>         </DamageReport>
>
> and so set the possible values for the elements in the DTD, or you have
> to add an attribute to the actual word:
>
>         <DamageReport>
>                 A
>                 <Cause Type="Fire">
>                         massive fire
>                 </Cause>
>                 ravaged the building and
>                 <Casualties Number="12" Type="Death">
>                         12 people were killed
>                 </Casualties>
>                 . Early FBI reports indicate that
>                 <Motive Type="Arson" Status="Suspected">
>                         arson is suspected
>                 </Motive>
>                 .
>         </DamageReport>
>
> What I'm getting at is you can't weigh down free text with too much
> information - that should come from the meta data. The Oracle example
> that follows this, of a query for 'MOTIVE = ARSON', shows how messy
> things could get if you don't 'normalise'. With our approach we can
> have:
>
>                 Early FBI reports indicate that <Motive Type="Arson"
> Status="Suspected">the fire
>                 may have been started on purpose</Motive>.
>
> because the text itself is not carrying the weight of the information,
> but Oracle couldn't. With their example you would have to search for all
> possible wordings of 'ARSON' or you would have to tell everyone in the
> company to use the word 'ARSON' when they are writing their reports.
>
> Note that I only make this point is relation to free text situations. I
> have no problem with:
>
>         <DamageReport>
>                 <Cause>Fire</Cause>
>                 <Casualties>12</Casualties>
>                 <Motive>Arson</Motive>
>                 <Summary>
>                         A massive Fire ravaged the building and
>                         12 people were killed. Early
>                         FBI reports indicate that arson is
>                         suspected.
>                 </Summary>
>         </DamageReport>
>
> which the Oracle proposal would easily implement. But I think you need
> to implement the 'attribute' type solution as well, to fully make use of
> the free text approach.
>
> Regards,
>
> Mark
>
>
>
> Mark Birbeck
> Managing Director
> Intra Extra Digital Ltd.
> 39 Whitfield Street
> London
> W1P 5RE
> w: http://www.iedigital.net/
> t: 0171 681 4135
> e: [log in to unmask]
*****************************************************************************
The information in this email is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this email by anyone else
is unauthorized.

If you are not the intended recipient, any disclosure, copying, distribution
or any action taken or omitted to be taken in reliance on it, is prohibited
and may be unlawful. When addressed to our clients any opinions or advice
contained in this email are subject to the terms and conditions expressed in
the governing KPMG client engagement letter.
*****************************************************************************