[jcifs] problem encoding

Michael B. Allen miallen at eskimo.com
Fri Jan 17 07:36:17 EST 2003


I don't think this has anything to do with jCIFS or anything you just
said. You could URL encode the specification for the space shuttle into
an HTTP URL. I suspect the problem is the UTF-8 parameter instructs the
URLEncoder to create UTF-8 which isn't going to work. Or the encoder
is deficient. I don't know.  Otherwise the OP will have to come up with
their own scheme. Not sure if the Base64 class takes Unicode. This has
nothing to do with jCIFS, encoding of URLs, or anything like that. It's
an application specific issue.

On Thu, 16 Jan 2003 13:05:22 -0600
"Christopher R. Hertel" <crh at ubiqx.mn.org> wrote:

> This looks as though it is an eight-bit character encoding issue.
> 
>   [Mike:  I'll just smile quietly to myself.]  ;)
> 
> Microsoft uses different "DOS Code Pages" (also known as "OEM Character
> Sets) to encode file and directory names.  These go back to the old days
> when IBM managed the SMB protocol, long before Unicode was available. My
> guess, in this case, is that the filename on the SMB server side is
> written using one of the DOS Code Pages.
> 
> The DOS Code Page values do not match Unicode values.  When you enter a
> file name such as "Fübär.txt" and convert it to UTF-8 you will get the
> bytes:  { 'F', 0xFC, 'b', 0xE4, 'r', 0 }.
> 
>   [Note, in case that doesn't come through correctly, that the filename is
>   supposed to be 'F', o-umlout, 'b' a-umlout, 'r'.]
> 
> Using DOS Code Page 437, however, the same string would be encoded as:
> { 'F', 0x81, 'b', 0x84, 'r', 0 }
> 
> So, even though your URL is encoded "correctly", it gets to the other end 
> and the server interprets it using the wrong set of byte-to-character 
> mapping values.  The UTF-8 string doesn't match the DOS Codepage 437 
> string.
> 
> I don't have a good solution off-hand.  The best thing to do is to ensure 
> that both the client and the server are using the same codeset.  Unicode 
> would be a good choice.  Conversion from UTF-8 to Unicode is 
> straight-forward.  The only other option (and this is really ugly) is to 
> include DOS Codepage definitions with jCIFS and force the user to select 
> the correct codepage for the particular server.
> 
> That latter one is a bad idea.
> 
> Better to negotiate Unicode, where possible.
> 
> Chris -)-----
> 
> PS.  Any chance you'll be at the Samba/XP conference in Goettingen,
>      Germany, next April?  www.sambaxp.org
> 
> On Thu, Jan 16, 2003 at 04:08:53PM +0100, andrea.lanza at frameweb.it wrote:
> > My problem is the following.
> > Becouse I am writing a servlet using jcifs, I have an argoument passwd to
> > the servlet's url containing the SMB file to get and work on.
> > 
> > I encode this argoument using:
> > 
> > java.net.URLEncoder.encode("Name of my SMB File","UTF-8");
> > 
> > Everithing is OK fo a lot of character like () [] {} and so on...
> > 
> > But with some characters (accented characters , the degree symbol and
> > other) the encode fails.
> > 
> > Which is the best encoding I can use ?
> > 
> > thanks in advance,
> > 
> > Andrea
> > 
> > 
> 
> -- 
> Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
> jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
> ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
> OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org


-- 
A  program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes  the  potential  for it to be applied to tasks that are
conceptually  similar and, more important, to tasks that have not
yet been conceived. 



More information about the jcifs mailing list