The code-points I get seem a bit weird, they are:
0x65E5 0x672C 0x8A9E.
I checked the first 3 ideographs with the "CJK Unified Ideographs
Range: 4E00-9FAF" chart at: http://www.unicode.org/charts/PDF/U4E00.pdf,
and they do match the ideographs I expect,
though I'm not sure why I get the 0xFF11 in place of the number "1".
something strange maybe occurring in the transcoding. I'm using ICU to
go from Shift-JIS to UTF-16.
The files I attempted to attached earlier can now be retrieved at:
(the japanese characters in element name, 3KB)
(the japanese characters in content, 2KB)
> -----Original Message-----
> From: G. Ken Holman [mailto:[log in to unmask]]
> Sent: Monday, May 31, 2004 3:46 PM
> To: [log in to unmask]
> Subject: Re: non-western XML element names?
> At 2004-05-31 14:32 -0700, Juan Chu Chow wrote:
> >Thanks for your reply. I'll try to paste the offending contents:
> My mailer is only seeing question marks. Do you have the Unicode code
> points for those three offending letters?
> >I find it a bit strange that those 3 characters would not be
> supported in
> >the element name. They are basic
> >characters whose meaning is the "Japanese language".
> And would be accommodated in XML 1.1, but in XML 1.0 there
> are limitations
> on those Unicode code points that are allowed to be in names.
> .................. Ken
> Public courses: Spring 2004 world tour of hands-on XSL instruction
> Next: 3-day XSLT/XPath; 2-day XSL-FO - Birmingham, UK June 14,2004
> World-wide on-site corporate, govt. & user group XML/XSL training.
> G. Ken Holman mailto:[log in to unmask]
> Crane Softwrights Ltd. http://www.CraneSoftwrights.com/l/
> Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (F:-0995)
> Male Breast Cancer Awareness http://www.CraneSoftwrights.com/l/bc
> Legal business disclaimers: http://www.CraneSoftwrights.com/legal