LISTSERV mailing list manager LISTSERV 16.5

Help for XML-L Archives


XML-L Archives

XML-L Archives


XML-L@LISTSERV.HEANET.IE


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Monospaced Font

LISTSERV Archives

LISTSERV Archives

XML-L Home

XML-L Home

XML-L  December 1998

XML-L December 1998

Subject:

Re: Search Engines

From:

Mark Birbeck <[log in to unmask]>

Reply-To:

General discussion of Extensible Markup Language <[log in to unmask]>

Date:

Wed, 9 Dec 1998 12:26:38 -0000

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (167 lines)

There are other relationships that must be established. For example, in
proposals for tags that define data, the following type of relationship
is possible:

        <book>
                <title>Book with no pictures</title>
        </book>

        <author>
                <name>Fred Bloggs</name>
                <wrote>Book with no pictures</wrote>
        </author>

In other words, the book does not contain author information, but the
author contains book information.

This implies to me that the next generation of index servers will have
to take more than simply a document's DTD into account - since this
relationship cannot be expressed there. (And if a document has no DTD
then that's even worse. If nothing else you don't know what fields could
be there but are empty.)

In a way this parallels good old HTML. You can submit a page for
indexing and all words are indexed, regardless of their role in the
document. Lets say this is level 1. You can then 'add value' by tagging
up the document with meta-data - level 2.

Similarly, you can submit an XML document for indexing with no DTD or
specialised schema, but to me, that is not much better than the 'level
1' of HTML. You have loads of tags, but we know nothing about them. If
you are in a closed environment - as with Vincent's example - you can
provide people with 'meta-knowledge' about those tags, say publishing a
list of explanations like "auth is the author field" and "createdate is
the date that the document was first created on", but this falls down
the minute you try to provide the information in a generalised way to a
number of people outside your closed group, because they need this
meta-knowledge. I would suggest some sort of 'level 2' is going to be
needed for a proper use of search engines, perhaps based on
XData/XSchema, or whatever it's called.

I also think there is little possibility of getting round the problem of
defining DTDs. It's a bit like the perennial problems we face with
object-oriented programming - what to put in the base classes - but some
of the schemas that are currently defined need to have some of their
common components ripped out and put into some 'base schemas'.

Anyway, if we want more mind-boggling problems to ponder, what about the
actual syntax used to search? Everyone's talking about the trickiness of
searching for the <author> field, but what about it's context? What if I
only want books written by a certain author and not magazine articles?
XSL/XQuery type proposals suggest:

        book[author=Fred]

or

        book/author[name=Fred]

but this implies you need to know quite a lot about the schema. You need
to know that author is a direct child of book, and not three levels
down.

Regards,

Mark Birbeck
Managing Director
Intra Extra Digital Ltd.
39 Whitfield Street
London
W1P 5RE
w: http://www.iedigital.net/
t: 0171 681 4135
e: [log in to unmask]




-----Original Message-----
From: Daneker, Vincent
Sent: 09 December 1998 10:21
To: [log in to unmask]
Subject: Re: Search Engines


There also the Dublin Core Initiative http://purl.oclc.org/dc/ . We've
made
use of some of the elements to mark-up official letters, in the <HEAD>
element of an HTML document, and used a search engine that restricted
its
search to those meta-tags. The principle should apply to XML documents
as
well.

We do have a big advantage: the search engine is for searches of our
internal site only, so we know what tags are being used and who will be
doing the searches. This is difficult to achieve on a web wide basis.
While
it might be possible to dynamically generate a list of tags in use, I
really
wouldn't want to wade through the results: <author>, <auth>, <autuer>,
<escritor>, <writer>, etc. All of which are legitimate ways to express
the
concept. It may be that a series of domain and even language specific
dialects develop because the participants in that area of knowledge
agree,
formally or otherwise, that is how to proceed.

Vincent Daneker
Information Management
[log in to unmask]
> -----Original Message-----
> From: Linda van den Brink [SMTP:[log in to unmask]]
> Sent: Tuesday, December 08, 1998 10:36 AM
> To: [log in to unmask]
> Subject: RE: Search Engines
>
> I've worked with the TEI dtd, but I'd say it's a bit more than just a
tag
> library!
>
>
> > Also, what about searching for documents in another language?
> >That's just one of the problems: I will want to use <auteur> in
> >France, not <author>.
>
> Why? I used the TEI dtd to markup selections of Dutch poems. It never
> bothered me that the tag names are in English. Do you mean that people
who
> don't speak English will want to create tags in their native language,
or
> are there other reasons as well?
>
>
> -----Original Message-----
> From: Charles Muller [mailto:[log in to unmask]]
> Sent: Tuesday, December 08, 1998 9:13 AM
> To: [log in to unmask]
> Subject: Re: Search Engines
>
>
> >I agree that it would be a good thing to have tag libraries. (aren't
> there
> >*any* initiatives in this area already?).
>
> The work done by the Text Encoding Initiative in this area is already
> quite
> extensive, and
> people who work in the humanities fields have been using their
> recommendations for some
> time. Please see:
>
> http://www-tei.uic.edu/orgs/tei/
>
>
> Regards,
>
>
> Charles Muller
>
>
> Resources for East Asian Language and Thought
> http://www.human.toyogakuen-u.ac.jp/~acmuller
>
> Toyo Gakuen University
> 1660 Hiregasaki, Nagareyama-shi
> Chiba 270-0161 Japan

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

February 2018
February 2017
August 2016
June 2016
March 2016
January 2016
July 2014
April 2014
January 2014
July 2013
February 2013
September 2012
August 2012
October 2011
August 2011
June 2011
January 2011
November 2010
October 2010
July 2010
June 2010
March 2010
February 2010
January 2010
November 2009
September 2009
August 2009
July 2009
May 2009
March 2009
December 2008
October 2008
August 2008
May 2008
March 2008
February 2008
January 2008
December 2007
October 2007
August 2007
June 2007
March 2007
January 2007
December 2006
September 2006
July 2006
June 2006
April 2006
February 2006
January 2006
November 2005
September 2005
August 2005
July 2005
June 2005
May 2005
March 2005
January 2005
October 2004
August 2004
July 2004
June 2004
May 2004
March 2004
February 2004
January 2004
December 2003
November 2003
October 2003
September 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
November 2002
October 2002
September 2002
August 2002
July 2002
June 2002
May 2002
April 2002
March 2002
February 2002
January 2002
December 2001
November 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
June 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
July 1999
June 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
December 1997
November 1997
October 1997

ATOM RSS1 RSS2



LISTSERV.HEANET.IE

Secured by F-Secure Anti-Virus CataList Email List Search Powered by the LISTSERV Email List Manager