LISTSERV mailing list manager LISTSERV 15.5

Help for XML-L Archives


XML-L Archives

XML-L Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave XML-L
Reply | Post New Message
Search Archives


Subject: Re: Search Engines
From: Linda van den Brink <[log in to unmask]>
Reply-To:General discussion of Extensible Markup Language <[log in to unmask]>
Date:Mon, 7 Dec 1998 16:21:21 +0100
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (92 lines)


I agree that it would be a good thing to have tag libraries. (aren't there
*any* initiatives in this area already?) I imagine they would be lists of
tag names that are standardized (though not necessarily formally). However,
I wouldn't want the structure of how these tags nest etc to be standardized
as well. If a list of standard tag names existed, I would certainly use it
(though it would not have to be mandatory).

For example, if most people used

<author>

instead of

<auth> or <writer>

that would make context searching a lot more powerful, even without standard
structures. In my opinion, this would be the right balance between standards
and openness/freedom.

Linda van den Brink

-----Original Message-----
From: Richard Lander [mailto:[log in to unmask]]
Sent: Friday, December 04, 1998 10:00 PM
To: [log in to unmask]
Subject: Re: Search Engines


 Sam,

I agree with you, but I don't think that searching is always going to be
as easy as the author example that we've been using. Some DTDs use very
esoteric GIs and others are so generalized that they don't offer any
contextual information.

For example, during the authoring process, I might create notes, examples
and warnings. During transformation to my publishing DTD, I might
transform (is transform the correct term?) those elements to docpart, a
catch-all. If that is the case, the search engine won't be able to do
much in terms of context.

Another problem with auth is that you might end of with 'authority' and
or some other GI containing auth. Although the data+markup model will be
better than what we have now, particularly if you are familiar with the
associated DTD, it will be hit-and-miss most of the time, IMHO.

I think that tag libraries are not a bad idea anyway, as a list is almost
always structured the same way, with minor variance. Tables, links,
binary object and other type of elements also have base models. I'd
subscribe to a tag library to make DTD construction a bit easier,
grabbing and modifying structures as I built the DTD.

Richard.


On Fri, 4 Dec 1998, Hunting, Sam wrote:

> I don't think we have to create a tag Esperanto in order to have search
> engines that are enhanced because they are content-based, ie in XML.
>
> Where we used to have one domain to search, text as such, we now have two:
> markup, and data (content = data + markup). The markup establishes the
> context in which the data is to be found.
>
> To use your example, a query where the string "auth" was found in the
markup
> domain would work for to discover authors named "Smith":
>
>         <person role="AUTHor">SMITH</person>
>         <AUTHor>SMITH</author>
>
> while filtering out sentences like:
>
>         <p>I'm an AUTHor, and any editor named SMITH is no friend of
> mine.</p>
>
> So perhaps full-text retrieval, both in the two domains, markup and data,
> will have enough advantages for our current retrieval vendors to get
moving
> on this problem right away, without standardizing element names any more
> than human language is already standardized -- since markup, like data, is
> meant to be human-readable (Design Goal 6 of XML: "XML documents should be
> human-legible and reasonably clear.") This would be yet another advantage
> gained by the fact that XML parsers are easy to build.
>

<?xml version="1.0" standalone="yes"?>
<INFO>
<NAME>Richard Lander</NAME>
<EMAIL>relander at uwaterloo.ca</EMAIL>
<WEB>http://pdbeam.uwaterloo.ca/~rlander</WEB>
</INFO>

Back to: Top of Message | Previous Page | Main XML-L Page

Permalink



LISTSERV.HEANET.IE

CataList Email List Search Powered by the LISTSERV Email List Manager