Print

Print


I'm probably not thinking of this correctly, but it seems the search
issue--while obviously important--can nonetheless be more "localized" to
the actual document instance with it's own document type declaration
establishing which DTD is to be used.  If such an XML document or (and
here the "Fragments" group would need to be inline) portion thereof were
opened in a browser, the search engine could address the DTD and its tags
as a document-specific set of searches.  In otherwords, if the DTD is
already being identified, why can't the search engine cull tags from it
for use by the reader?

__________________________
John Robert Gardner, Ph.D.
http://vedavid.org/diss/
____________________________________________________
Obermann Center         &    The Graduate College
  for Advanced Studies
               The University of Iowa
____________________________________________________
"Your dreams must always lead you beyond comfort"

On Mon, 7 Dec 1998, Mark Birbeck wrote:

> My tuppence worth (or does everyone on this list say 'dime'):
>
> I think we'd need to get a balance between openness and certain basics
> that must be there to make this work. I would have thought there needs
> to be a basic DTD that is not 'closed', i.e. people are free to add
> whatever they want to it, for their own particular purposes, but
> essential fields are defined. This means that if you want your document
> to be easily accessed then you would use the fields specified.
>
> I would have thought the elements would have to be along the lines of
> <Bassdocns:author> and not <author> to work. Otherwise, there is just no
> way it could all happen - even if everyone informally uses <author>,
> does that then contain <firstname>/<lastname>, a pointer or even a vCard
> record? Structure is therefore as important as the tag itself.
>
> Also, what about searching for documents in another language? At least a
> DTD can have synonyms.
>
> Mark Birbeck
> Managing Director
> Intra Extra Digital Ltd.
> 39 Whitfield Street
> London
> W1P 5RE
> w: http://www.iedigital.net/
> t: 0171 681 4135
> e: [log in to unmask]
>
>
>
> -----Original Message-----
> From: Linda van den Brink
> Sent: 07 December 1998 15:21
> To: [log in to unmask]
> Subject: Re: Search Engines
>
>
> I agree that it would be a good thing to have tag libraries. (aren't
> there
> *any* initiatives in this area already?) I imagine they would be lists
> of
> tag names that are standardized (though not necessarily formally).
> However,
> I wouldn't want the structure of how these tags nest etc to be
> standardized
> as well. If a list of standard tag names existed, I would certainly use
> it
> (though it would not have to be mandatory).
>
> For example, if most people used
>
> <author>
>
> instead of
>
> <auth> or <writer>
>
> that would make context searching a lot more powerful, even without
> standard
> structures. In my opinion, this would be the right balance between
> standards
> and openness/freedom.
>
> Linda van den Brink
>
> -----Original Message-----
> From: Richard Lander [mailto:[log in to unmask]]
> Sent: Friday, December 04, 1998 10:00 PM
> To: [log in to unmask]
> Subject: Re: Search Engines
>
>
>  Sam,
>
> I agree with you, but I don't think that searching is always going to be
> as easy as the author example that we've been using. Some DTDs use very
> esoteric GIs and others are so generalized that they don't offer any
> contextual information.
>
> For example, during the authoring process, I might create notes,
> examples
> and warnings. During transformation to my publishing DTD, I might
> transform (is transform the correct term?) those elements to docpart, a
> catch-all. If that is the case, the search engine won't be able to do
> much in terms of context.
>
> Another problem with auth is that you might end of with 'authority' and
> or some other GI containing auth. Although the data+markup model will be
> better than what we have now, particularly if you are familiar with the
> associated DTD, it will be hit-and-miss most of the time, IMHO.
>
> I think that tag libraries are not a bad idea anyway, as a list is
> almost
> always structured the same way, with minor variance. Tables, links,
> binary object and other type of elements also have base models. I'd
> subscribe to a tag library to make DTD construction a bit easier,
> grabbing and modifying structures as I built the DTD.
>
> Richard.
>
>
> On Fri, 4 Dec 1998, Hunting, Sam wrote:
>
> > I don't think we have to create a tag Esperanto in order to have
> search
> > engines that are enhanced because they are content-based, ie in XML.
> >
> > Where we used to have one domain to search, text as such, we now have
> two:
> > markup, and data (content = data + markup). The markup establishes the
> > context in which the data is to be found.
> >
> > To use your example, a query where the string "auth" was found in the
> markup
> > domain would work for to discover authors named "Smith":
> >
> >         <person role="AUTHor">SMITH</person>
> >         <AUTHor>SMITH</author>
> >
> > while filtering out sentences like:
> >
> >         <p>I'm an AUTHor, and any editor named SMITH is no friend of
> > mine.</p>
> >
> > So perhaps full-text retrieval, both in the two domains, markup and
> data,
> > will have enough advantages for our current retrieval vendors to get
> moving
> > on this problem right away, without standardizing element names any
> more
> > than human language is already standardized -- since markup, like
> data, is
> > meant to be human-readable (Design Goal 6 of XML: "XML documents
> should be
> > human-legible and reasonably clear.") This would be yet another
> advantage
> > gained by the fact that XML parsers are easy to build.
> >
>
> <?xml version="1.0" standalone="yes"?>
> <INFO>
> <NAME>Richard Lander</NAME>
> <EMAIL>relander at uwaterloo.ca</EMAIL>
> <WEB>http://pdbeam.uwaterloo.ca/~rlander</WEB>
> </INFO>
>