* Pat Bensky
| I am creating XML files from my database. I understand that some
| sort of character encoding is necessary in order for upper ASCII
| characters to be interpreted correctly.
You need that for all characters, actually. All text has an encoding,
but we are so used to ASCII that we tend to forget that is just
another encoding. EBCDIC systems still exist, though...
| My XML file header looks like this:
| <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
Now you've said that your encoding is UTF-8. If you do it had better
be UTF-8 as well.
That is, if you want to write "é" you must do it using 0xC3 0xA9, not
| Here's a bit of sample problem text:
| "Now available with SuperDrive, the eMac, Applešs most affordable
| PowerPC "
| (note that the š is a typesetting apostrophe, not a single quote
Heh. It arrives as a superscript 1 here. :)
I guess you are probably talking about U+2019, also known as RIGHT
SINGLE QUOTATION MARK. In UTF-8 that would be 0xE2 0x80 0x99.
| When an XML file containing this text is imported into an InDesign
| document, the š appears as a square box.
Could you give us the numeric value of the byte? That would help.
| Can anybody point me in the right direction with this?
Basically, what you need to do is one of the following:
- find out what encoding your data is in, and just label the XML
file with that encoding so XML parsers can do legacy -> Unicode
- convert your data to some other encoding, preferrably UTF-8 and
label the XML document with that. To do this you must know what
the source encoding is (unless you have some function or procedure
that knows it).
Conversion to ISO 8859-1 is not likely to work, since the U+2019
character is not in 8859-1...
Lars Marius Garshol, Ontopian <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC <URL: http://www.garshol.priv.no >