LISTSERV mailing list manager LISTSERV 15.5

Help for XML-L Archives


XML-L Archives

XML-L Archives


View:

Next Message | Previous Message
Next in Topic | Previous in Topic
Next by Same Author | Previous by Same Author
Chronologically | Most Recent First
Proportional Font | Monospaced Font

Options:

Join or Leave XML-L
Reply | Post New Message
Search Archives


Subject: Re: Search Engines
From: Mark Birbeck <[log in to unmask]>
Reply-To:General discussion of Extensible Markup Language <[log in to unmask]>
Date:Wed, 9 Dec 1998 12:26:38 -0000
Content-Type:text/plain
Parts/Attachments:
Parts/Attachments

text/plain (166 lines)


There are other relationships that must be established. For example, in
proposals for tags that define data, the following type of relationship
is possible:

        <book>
                <title>Book with no pictures</title>
        </book>

        <author>
                <name>Fred Bloggs</name>
                <wrote>Book with no pictures</wrote>
        </author>

In other words, the book does not contain author information, but the
author contains book information.

This implies to me that the next generation of index servers will have
to take more than simply a document's DTD into account - since this
relationship cannot be expressed there. (And if a document has no DTD
then that's even worse. If nothing else you don't know what fields could
be there but are empty.)

In a way this parallels good old HTML. You can submit a page for
indexing and all words are indexed, regardless of their role in the
document. Lets say this is level 1. You can then 'add value' by tagging
up the document with meta-data - level 2.

Similarly, you can submit an XML document for indexing with no DTD or
specialised schema, but to me, that is not much better than the 'level
1' of HTML. You have loads of tags, but we know nothing about them. If
you are in a closed environment - as with Vincent's example - you can
provide people with 'meta-knowledge' about those tags, say publishing a
list of explanations like "auth is the author field" and "createdate is
the date that the document was first created on", but this falls down
the minute you try to provide the information in a generalised way to a
number of people outside your closed group, because they need this
meta-knowledge. I would suggest some sort of 'level 2' is going to be
needed for a proper use of search engines, perhaps based on
XData/XSchema, or whatever it's called.

I also think there is little possibility of getting round the problem of
defining DTDs. It's a bit like the perennial problems we face with
object-oriented programming - what to put in the base classes - but some
of the schemas that are currently defined need to have some of their
common components ripped out and put into some 'base schemas'.

Anyway, if we want more mind-boggling problems to ponder, what about the
actual syntax used to search? Everyone's talking about the trickiness of
searching for the <author> field, but what about it's context? What if I
only want books written by a certain author and not magazine articles?
XSL/XQuery type proposals suggest:

        book[author=Fred]

or

        book/author[name=Fred]

but this implies you need to know quite a lot about the schema. You need
to know that author is a direct child of book, and not three levels
down.

Regards,

Mark Birbeck
Managing Director
Intra Extra Digital Ltd.
39 Whitfield Street
London
W1P 5RE
w: http://www.iedigital.net/
t: 0171 681 4135
e: [log in to unmask]




-----Original Message-----
From: Daneker, Vincent
Sent: 09 December 1998 10:21
To: [log in to unmask]
Subject: Re: Search Engines


There also the Dublin Core Initiative http://purl.oclc.org/dc/ . We've
made
use of some of the elements to mark-up official letters, in the <HEAD>
element of an HTML document, and used a search engine that restricted
its
search to those meta-tags. The principle should apply to XML documents
as
well.

We do have a big advantage: the search engine is for searches of our
internal site only, so we know what tags are being used and who will be
doing the searches. This is difficult to achieve on a web wide basis.
While
it might be possible to dynamically generate a list of tags in use, I
really
wouldn't want to wade through the results: <author>, <auth>, <autuer>,
<escritor>, <writer>, etc. All of which are legitimate ways to express
the
concept. It may be that a series of domain and even language specific
dialects develop because the participants in that area of knowledge
agree,
formally or otherwise,  that is how to proceed.

Vincent Daneker
Information Management
[log in to unmask]
> -----Original Message-----
> From: Linda van den Brink [SMTP:[log in to unmask]]
> Sent: Tuesday, December 08, 1998 10:36 AM
> To:   [log in to unmask]
> Subject:      RE: Search Engines
>
> I've worked with the TEI dtd, but I'd say it's a bit more than just a
tag
> library!
>
>
> > Also, what about searching for documents in another language?
> >That's just one of the problems: I will want to use <auteur> in
> >France, not <author>.
>
> Why? I used the TEI dtd to markup selections of Dutch poems. It never
> bothered me that the tag names are in English. Do you mean that people
who
> don't speak English will want to create tags in their native language,
or
> are there other reasons as well?
>
>
> -----Original Message-----
> From: Charles Muller [mailto:[log in to unmask]]
> Sent: Tuesday, December 08, 1998 9:13 AM
> To: [log in to unmask]
> Subject: Re: Search Engines
>
>
> >I agree that it would be a good thing to have tag libraries. (aren't
> there
> >*any* initiatives in this area already?).
>
> The work done by the Text Encoding Initiative in this area is already
> quite
> extensive, and
> people who work in the humanities fields have been using their
> recommendations for some
> time. Please see:
>
> http://www-tei.uic.edu/orgs/tei/
>
>
> Regards,
>
>
> Charles Muller
>
>
> Resources for East Asian Language and Thought
> http://www.human.toyogakuen-u.ac.jp/~acmuller
>
> Toyo Gakuen University
> 1660 Hiregasaki, Nagareyama-shi
> Chiba 270-0161 Japan

Back to: Top of Message | Previous Page | Main XML-L Page

Permalink



LISTSERV.HEANET.IE

CataList Email List Search Powered by the LISTSERV Email List Manager