[jcifs] Plusses, again.

Thu Jan 24 11:47:59 EST 2002

> -----Original Message-----
> From:	Rob Wygand [SMTP:rob at wygand.com]
> Sent:	Wednesday, January 23, 2002 6:53 PM
> To:	Michael B Allen
> Cc:	jcifs at samba.org
> Subject:	Re: [jcifs] Plusses, again.
> 
> Mike,
> 
> I found a bona fide bug regarding encoding, this time. =)
> 
> Here's the skinny: I have a directory named 123+ with a subdirectory of 
> HR+Recruiting. Here's my sample code to list all the children of 123+
> 
>    SmbFile[] files = null;
> 
>    SmbFile a = new SmbFile ("smb://server/share//123%2B"); // 123+
>    System.out.println (a.getCanonicalPath());
>    files = a.listFiles();
>    for (int i = 0; i < files.length; ++i)
>    {
>      System.out.println ("\t" + files[i].getCanonicalPath());
>    }
> 
> Here's the output:
> 
>    smb://server/share/123%2B
>      smb://server/share/123%2B/HR+Recruiting
> 
> Note that the + in 123+ is encoded, as I passed it in, but the + in 
> HR+Recruiting is not encoded. This is very bad. If *none* of it is 
> encoded, I can always encode it. if it's always encoded, I can always 
> decode it, but if parts of it are decoded and parts of it are not, I 
> don't know what I should do with each part. Is that %2B an encoded 
> value, or is it the literal name of the directory??
> 
	Yikes! We're really going around in circles now aren't we?

	First off, in case you were wondering why we need encoding or decoding these URLs at
	all, we must encode '@' signs because they are a critical identification point in
	\separating the users authentication credentials from the server and path. Yes, the
	artifacts of decoding the entire path are the source of problems however we cannot only
	encode '@' signs. That would be inconsistent and users might expect to be able to
	URL encode URLs anyway so we have to do it.

	Knowing that, there is no way to know if the user intended that a character be treated
	litterally or not. For example a '+' sign might be litterally a '+' or it may be a space. So
	we must unconditionally decode everything after the authentication information.

	So, about this particular problem, it occurs because the list* functions are appending
	the file or directory name literrally. The question is, if these routines unconditionally
	URLencode the child pathname will this lead to further issue? I think it will work.

	Regarding your overall stategy, if you are encoding and decoding paths on behalf of the
	user you will be implementing your own path specification. Users will no longer be able
	to URL encode paths themselves at the risk of them being encoded by you. This being
	the case, your convention should not resemble the smb:// URL. If your intended
	audience are users familiar with the windows UNC path convention you might as well
	just use that. After the URLencode is added to the list* methods you should be able to
	apply your own UNCEncode and UNCDecode utilities. The important thing here is that
	the authentication information may no longer be specified and therefore the '@' sign is
	no longer an issue.

	I will look at the fix now...

	Mike