[jcifs] URL encoding

Michael B Allen mballen at erols.com
Mon Mar 11 07:21:41 EST 2002


On Sun, 10 Mar 2002 19:11:28 +0900
"david talbot" <chukhonets at hotmail.com> wrote:

> Running SmbCrawler gives exactly the same results as I got with my own test
> class. The Japanese directory.listFiles() just returns a non-existant file
> with the same name as the directory. Trying to access that throws a
> FileNotFound exception.
> 
> I had a go with some Russian share names and they worked fine as you say.

Keep in mind shares are different from directories. jCIFS does not
support Unicode /share/ names.

> Problem is I'm trying to support Japanese.  As a matter of fact if I stick
> to the single byte Japanese characters (rarely used alone in real-life)
> jCIFS seems to work okay when client and server machines have the same
> default encoding (e.g. Windows MS932/Shift_JIS) and the path is not
> url-encoded.

So you *did* try the codepage thing. Glad to here it worked. I don't
know why it wouldn't but it's still interesting.

> I experimented with a few different share names and found that although when

Again, be careful about terminology here. Shares are handled differently
but as far as I can tell you are not trying Japanese named shares right?

> the directory name is a single double-byte character (e.g. %91%E5), calling

Mmm, is that how you URL encode a 16 bit value? Maybe it's %E591? I
don't have enough time to check right now I have to go to work.

> list() only returns a non-existant file with the same name as the directory,
> if the directory name is two characters or more i.e. 4+ bytes long ( for
> instance %91%E5%8F%AC) then an invalid directory SmbException is thrown.
> 
> Doing a log=ALL for list() on the single character directory
> (smb://server/share/%91%E5)  the value after "find with path=" is correctly
> decoded into the Japanese character I want and in the  Trans2FindFirst2 the
> hex values for the directory name are correct up to the final 2A (i.e. "*")
> which is missing after the last 5C ("\") .
> 
> 
> Trans2FindFirst2[command=SMB_COM_TRANSACTION2,received=false,errorCode=0x000
> 00000,flags=0x0018,flags2=0x0001,tid=55297,pid=4502,uid=0,mid=3,wordCount=15
> ,byteCount=18,totalParameterCount=17,totalDataCount=0,maxParameterCount=10,m
> axDataCount=1200,maxSetupCount=0,flags=0x00,timeout=0,parameterCount=17,para
> meterOffset=66,parameterDisplacement=0,dataCount=0,dataOffset=84,dataDisplac
> ement=0,setupCount=1,pad=1,pad1=1,searchAttributes=0x16,searchCount=15,flags
> =0x00,informationLevel=0x104,searchStorageType=0,filename=\大\*]
> 
> 3 10 17:34:26.420 - smb sent
> 00000: FF 53 4D 42 32 00 00 00 00 18 01 00 00 00 00 00  |?SMB2...........|
> 00010: 00 00 00 00 00 00 00 00 01 D8 96 11 00 00 03 00  |.........?......|
> 00020: 0F 11 00 00 00 0A 00 B0 04 00 00 00 00 00 00 00  |.......°........|
> 00030: 00 00 00 11 00 42 00 00 00 00 00 01 00 01 00 12  |.....B..........|
> 00040: 00 00 16 00 0F 00 00 00 04 01 00 00 00 00 5C 91  |..............\.|
> 00050: E5 5C 00                                         |?\.             |

Yes, I see it in the filename= but it's not in the hex dump.

> With the 2 character directory name (smb://server/share/%91%E5%8F%AC) the
> value of "find with path=" is already wrong and the hex codes corresponding
> to the directory name are not visible anywhere in the dump including the
> Trans2FindFirst2. But the final 2A is there!

Well, what is the path in hex in this case? Is it U+463F U+5CAC ?
> 
> 3 10 17:46:04.140 - smb sent
> Trans2FindFirst2[command=SMB_COM_TRANSACTION2,received=false,errorCode=0x000
> 00000,flags=0x0018,flags2=0x0001,tid=59393,pid=32658,uid=0,mid=3,wordCount=1
> 5,byteCount=20,totalParameterCount=19,totalDataCount=0,maxParameterCount=10,
> maxDataCount=1200,maxSetupCount=0,flags=0x00,timeout=0,parameterCount=19,par
> ameterOffset=66,parameterDisplacement=0,dataCount=0,dataOffset=86,dataDispla
> cement=0,setupCount=1,pad=1,pad1=1,searchAttributes=0x16,searchCount=15,flag
> s=0x00,informationLevel=0x104,searchStorageType=0,filename=\?Fャ\*]
> 
> 3 10 17:46:04.140 - smb sent
> 00000: FF 53 4D 42 32 00 00 00 00 18 01 00 00 00 00 00  |?SMB2...........|
> 00010: 00 00 00 00 00 00 00 00 01 E8 92 7F 00 00 03 00  |.........?......|
> 00020: 0F 13 00 00 00 0A 00 B0 04 00 00 00 00 00 00 00  |.......°........|
> 00030: 00 00 00 13 00 42 00 00 00 00 00 01 00 01 00 14  |.....B..........|
> 00040: 00 00 16 00 0F 00 00 00 04 01 00 00 00 00 5C 3F  |..............\?|
> 00050: 46 AC 5C 2A 00                                   |Fï¿¢\*.           |
> 
> Any final suggestions would be really appreciated.

Well this is pretty hard for me to debug but I have a much better idea
of what you're doing. Have you ever seen clients talk to the server in
question in Unicode. I just want to rule out the possibilty that the
server is just not up to it.

Does Winzip support Unicode? Perhaps you can create a zip of these
directories so I can try to reproduce it on my NT machine? Otherwise,
maybe you can just give me the U+ Unicode notations for an errant sequence
of characters?

I think the solution to all of this is to properly fix the SmbURL
handling in jCIFS. That appears to be complicating the issue if
not at fault directly. You should not have to specify paths like
smb://server/share/%91%E5%8F%AC. If I fix the SmbURL handling you will be
able to type the proper character sequence without using escapes all the
time. So it would appear in Japanese provided you have the appropriate
glyphs in that Unicode range.

Mike

#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################




More information about the jcifs mailing list