This is a classic problem in markup systems. HTML has its <PRE> element,
but of course no such element is hard-wired into XML. Instead, the XML
specification provides for a reserved attribute (that can be used on any
At 04:25 PM 1/15/01 -0600, you wrote:
>As my group moves from traditional DTP to an XML-based content management
>system, one issue we keep encountering is how to handle documentation that
>presented to our users in a readable manner. To be readable to humans, code
>samples typically rely on format-specific elements such as line breaks and
>tabbed indents that have no place in the XML source. But if not in the
>source, how should line breaks, indents, and any other format-specific
>elements be managed?
Why not in the source? line feeds and tabs are allowed characters, and can
appear in any element containing character data.
The rule that parsers must follow is that "all whitespace must be passed to
the application". This ensures that it is not an XML _parser_ (that is,
what the spec calls an "XML processor") that will strip or munge any
whitespace. If this happens, it is because a downstream processor, such as
a browser, does this. Your task is to specify how this should _not_ happen.
The xml:space attribute is designed to say, in effect, "don't tread on me!"
It is allowed to have either of two values, 'preserve' or 'default'.
'default' is the normal case -- it says to the application "do what you
want with me." 'preserve' is the value you want.
Of course, it is still up to the application (your stylesheets, browsers,
whatever) to respect this and to lay out your whitespace-significant text
(code excerpts, ASCII art, etc.) properly.
Trying to mark up the code to achieve a particular formatting is generally
not a good idea. It's just extra work to undo something that we invested
alot in to begin with.
But with xml:space, any element can be a <PRE> element. Note that you'll
still have to escape any markup in your code snippets. This can be done
with entities (e.g. < for <), or with CDATA marked sections if you want
to look at the characters themselves in your source.
>In the bad old days of html I would have solved this problem using the <pre>
>element, but I'd like to think there's a better way. One solution would be
>to create an alternate markup scheme that encodes the nested structure of
>the code, for example:
> <tag type="head">
> <tag type="title">This is a title</tag>
> <tag type="body">
> <tag type="h1">This is a head</tag>
> <tag type="p">This is a paragraph</tag>
>Even without the indenting, this structure makes it clear, for example, that
>the <h1> and <p> elements are contained within the <body> element.
>XSL stylesheets could then be used to output html markup formatted for
>different media (book, html help, whatever) in a manner that reflected this
While this is devilish and intriguing, it probably doesn't do the job in a
way you'd be happy to maintain. People who write docs containing code
samples generally want simply to include the code sample and be done with
it, not be faced with some hellish markup scheme to accommodate. Also, what
about code that's not well-formed, etc.?
>This strikes me as a lot of work, however, and a bit cumbersome. I also
>don't think my group is unusual in needing to output nicely formatted code
>samples that can be displayed both online and in print. So I'm wondering
>whether any standard tools or strategies for dealing with this have emerged.
Yes, it's been tackled before. xml:space is the current iteration of the
Hope that helps,
Wendell Piez mailto:[log in to unmask]
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
Mulberry Technologies: A Consultancy Specializing in SGML and XML