[jcifs] SMB URL parsing [jcifs-0.7.0b5 released]
Christopher R. Hertel
crh at ubiqx.mn.org
Wed Oct 23 13:29:03 EST 2002
On Tue, Oct 22, 2002 at 07:49:54PM -0400, Allen, Michael B (RSCH) wrote:
> > -----Original Message-----
> > From: Christopher R. Hertel [SMTP:crh at ubiqx.mn.org]
> > Sent: Tuesday, October 22, 2002 5:25 PM
> > To: Michael B. Allen
> > Cc: jcifs
> > Subject: Re: [jcifs] jcifs-0.7.0b5 released
> > Ah. This is what I was missing when I read your questions about URL
> > parsing a while back. Sorry I was being dense...
> > * URLs that represent workgroups, servers, shares, or directories
> > must have a trailing slash '/'.
> > That's a problem from a user-interface point of view. People are used
> > to entering http://yahoo.com and getting yahoo.com. No trailing slash.
> > I am surprised that the java.net.URL class doesn't handle this sort of
> > thing.
> This is true of http URLs so no surprise here. If you leave off the
> slash the server will reply with an error.
The server will, because the server is expecting correct syntax. It's not
dealing directly with users.
> The browsers have been given the brains necessary to try again with
> a slash which is what jCIFS users will be required to do to their
Those brains need to be on the client side, as with the browser. Since
jCIFS is client code, I'm thinking that these features could be coded up
and included with the toolkit to avoid wheel-reinvention.
> This actually isn't as much of a problem as you might think. For
> certain applications (crawlers, file browsers) it would have been
> annoying if you are composing URLs with the parent SmbFile + getName()
> where the name of a directory didn't have a '/'.
Yes, I understand that the internal representation of the name must be
"correct" so that it can be manipulated correctly. That makes sense.
> To get around this problem I have changed getName() to include the
> '/' of the SmbFile is a directory.
> > * Canonicalization does not exceed the host component of the URL. So
> > smb://host/share/path/ + ../../../../foo/ is canonicalized to
> > smb://host/foo/ whereas previously the client would have reduced this
> > to smb://foo/.
> > That's also a problem, but it may be a problem with the SMB URL format
> > rather than java.net.URL. I don't know of any other URL form that
> > assigns meaning to <scheme>:// the way the SMB URL does. That's a
> > question for the URL gurus. It may be inherently broken.
> It may be possible to intercept this scenario but it was never clear
> to me that it is problematic. It's awkward when using SmbShell
> (which I fixed BTW) but consistent with what happens when you
> compose a URL like smb://server/share/path/to/file + /some/thing/else
> which gives smb://server/some/thing/else. Meaning the root of the
> "filesystem" really starts after the server. But this is debatible
> of course.
It is also not clearly defined by the SMB URL specification. That is:
what is the meaning of "smb://name" + ".."? I had simply assumed that it
would be "smb://". In other words, that "smb://" is the root of the URL
path. I do not know if this is consistent with URL behavior in general.
> > * Composing a URL with a workgroup and a second paremeter like
> > smb://workgroup/ + server/share/path/ used to be intellegent enough to
> > eliminate workgroup. This will now blindly compose the two arguments to
> > give smb://workgroup/server/share/path/ which is an illegal SMB URL.
> > This is really smb://workgroup/ + path since we don't know that the path
> > components represent a server or share until they are evaluated
> > semantically. Still, I understand the point. If we know (semantically)
> > that the URL "smb://workgroup/" represents a workgroup then we should
> > also know that adding anything to it would be invalid. The only way to
> > handle that situation is to remove the workgroup part and hope that the
> > next part is a server identifier--a reasonable guess.
> The old code always made this assumtion. Provided you know that
> 'workgroup' is a workgroup then the second parameter must be a
> server because you cannot have anything after the workgroup.
> The real problem here is the workgroup lookup. You have to basically
> do a getByName lookup in the middle of creating an SmbFile which
> is quite strange.
That's what I mean about evaluating the semantics. That's the problem
with overloading the SMB URL. The whole thing becomes a semantic, rather
than syntactic, problem.
> But after thinking about this, the scenario is quite rare and the
> user would be required to do the lookup anyway so we might as well
> just do it for them. I will look closer at changing this back
> at some point. Hopefully sooner than later.
I think that the problem lies in the fact that this isn't finalized in the
SMB URL spec. Urg.
> > I still think that a layer needs to be written above the java.net.URL
> > layer. Perhaps all it would do is parse and rebuild the URL before
> > handing it to java.net.URL. Of course, that means having some semantic
> > information, which means network traffic. Hmmm... I also understand that
> > the java.net.URL class is final. Dang. Ah, well...
> Again, you really have to familarize yourself with the java.net.URL
> class and associated appratus to understand the limitations here.
I certainly won't argue with that. :)
> You cannot necessarily parse and rebuild stuff before handing it
> to the URL class. You can to some extent (see jcifs.smb.Handler)
> but it's very restrictive because you start with URL which cannot
> have any additional state. This document explains a lot of it:
I will try to get a look at that soon.
Samba Team -- http://www.samba.org/ -)----- Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/ -)----- ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/ -)----- crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/ -)----- crh at ubiqx.org
More information about the jcifs