[jcifs] Plusses, again.
Allen, Michael B (RSCH)
Michael_B_Allen at ml.com
Thu Jan 24 11:47:59 EST 2002
> -----Original Message-----
> From: Rob Wygand [SMTP:rob at wygand.com]
> Sent: Wednesday, January 23, 2002 6:53 PM
> To: Michael B Allen
> Cc: jcifs at samba.org
> Subject: Re: [jcifs] Plusses, again.
>
> Mike,
>
> I found a bona fide bug regarding encoding, this time. =)
>
> Here's the skinny: I have a directory named 123+ with a subdirectory of
> HR+Recruiting. Here's my sample code to list all the children of 123+
>
> SmbFile[] files = null;
>
> SmbFile a = new SmbFile ("smb://server/share//123%2B"); // 123+
> System.out.println (a.getCanonicalPath());
> files = a.listFiles();
> for (int i = 0; i < files.length; ++i)
> {
> System.out.println ("\t" + files[i].getCanonicalPath());
> }
>
> Here's the output:
>
> smb://server/share/123%2B
> smb://server/share/123%2B/HR+Recruiting
>
> Note that the + in 123+ is encoded, as I passed it in, but the + in
> HR+Recruiting is not encoded. This is very bad. If *none* of it is
> encoded, I can always encode it. if it's always encoded, I can always
> decode it, but if parts of it are decoded and parts of it are not, I
> don't know what I should do with each part. Is that %2B an encoded
> value, or is it the literal name of the directory??
>
Yikes! We're really going around in circles now aren't we?
First off, in case you were wondering why we need encoding or decoding these URLs at
all, we must encode '@' signs because they are a critical identification point in
\separating the users authentication credentials from the server and path. Yes, the
artifacts of decoding the entire path are the source of problems however we cannot only
encode '@' signs. That would be inconsistent and users might expect to be able to
URL encode URLs anyway so we have to do it.
Knowing that, there is no way to know if the user intended that a character be treated
litterally or not. For example a '+' sign might be litterally a '+' or it may be a space. So
we must unconditionally decode everything after the authentication information.
So, about this particular problem, it occurs because the list* functions are appending
the file or directory name literrally. The question is, if these routines unconditionally
URLencode the child pathname will this lead to further issue? I think it will work.
Regarding your overall stategy, if you are encoding and decoding paths on behalf of the
user you will be implementing your own path specification. Users will no longer be able
to URL encode paths themselves at the risk of them being encoded by you. This being
the case, your convention should not resemble the smb:// URL. If your intended
audience are users familiar with the windows UNC path convention you might as well
just use that. After the URLencode is added to the list* methods you should be able to
apply your own UNCEncode and UNCDecode utilities. The important thing here is that
the authentication information may no longer be specified and therefore the '@' sign is
no longer an issue.
I will look at the fix now...
Mike
More information about the jcifs
mailing list