[jcifs] Character Set discussions

Glass, Eric eric.glass at capitalone.com
Sat Feb 8 07:48:47 EST 2003


> >
> > Yes.  The recommendation from RFC 2718 is to interpret the 
> escapes as
> > characters from the UTF-8 character set.
> 
> Whoops. Now I'm confused. I thought UTF-8 was just a scheme to handle
> characters with values larger than 0x7F (up to 32 bits), not 
> a character
> set.
> 

Sorry -- this is correct.  UTF-8 is a character encoding scheme for UCS.
The above uses the IANA registry definition of "character set", which is not
consistent with other standards bodies.  The UTF-8 charset consists of the
ISO 10646 (UCS) coded character set combined with the UTF-8 character
encoding scheme.

> Is there an underlying presumption in RFC 2718 that everyone is using
> Unicode?
> 
>  - Chuck
> 

2718 states:

      Unless there is some compelling reason for a particular scheme to
      do otherwise, translating character sequences into UTF-8 (RFC 2279)
      and then subsequently using the %HH encoding for unsafe octets is
      recommended.

RFC 2279 notes:

   ISO/IEC 10646-1 [ISO-10646] defines a multi-octet character set
   called the Universal Character Set (UCS), which encompasses most of
   the world's writing systems.  Two multi-octet encodings are defined,
   a four-octet per character encoding called UCS-4 and a two-octet per
   character encoding called UCS-2, able to address only the first 64K
   characters of the UCS (the Basic Multilingual Plane, BMP), outside of
   which there are currently no assignments.

   It is noteworthy that the same set of characters is defined by the
   Unicode standard [UNICODE], which further defines additional
   character properties and other application details of great interest
   to implementors, but does not have the UCS-4 encoding.

So no, it is not assumed that everyone uses Unicode, although Unicode is a
fairly significant subset of UCS.

Eric

Eric
 
**************************************************************************
The information transmitted herewith is sensitive information intended only
for use by the individual or entity to which it is addressed. If the reader
of this message is not the intended recipient, you are hereby notified that
any review, retransmission, dissemination, distribution, copying or other
use of, or taking of any action in reliance upon this information is
strictly prohibited. If you have received this communication in error,
please contact the sender and delete the material from your computer.


More information about the jcifs mailing list