[jcifs] Re: directory must end with '/'

Christopher R. Hertel crh at ubiqx.mn.org
Thu Dec 9 09:44:45 GMT 2004


On Thu, Dec 09, 2004 at 03:35:32AM -0500, Michael B Allen wrote:
> On Thu, 9 Dec 2004 00:43:37 -0600
> "Christopher R. Hertel" <crh at ubiqx.mn.org> wrote:
> 
> > The RFC's grammar includes the following:
> > 
> >   path_segments = segment *( "/" segment )
> > 
> > The thing to notice (the point that got this conversation going) is that 
> > the slashes are at the *front* of the authority and of each segment.  
> > Syntactically, that means that neither the authority section (the 
> > user at hostname:port part) nor the path segments (the directory names) 
> > require a trailing slash.
> 
> No, no, no, no, NO! The grammer does not apply to a fragment of a URL. The
> grammer in 2396 and in your draft is correct. It just doesn't apply to an
> incomplete URL. A complete URL is:
> 
> smb://server/share/path/file
> 
> The 2396 grammer is completely in line with this. The following is NOT a
> valid 2396 URL:
> 
> smb://server/share/path/
> 
> It is mearly a fragment of a URL

What?

Syntactically, that latter is a completely valid URI string.  I can run it 
through the grammar and it's accepted.  The only thing missing is the 
semantics:  What does it mean.  Those semantics are defined by the 
specific URI scheme.  In this case the SMB URI scheme.

RFC 2396 applies a minimum of semantics to URIs.  It doesn't care what the
path elements represent.  At all.  Also, 2396 allows for empty path
elements so the two examples you gave above are equally valid.

I'm not sure what distinction you're trying to make here, or why you're 
making it.

> but it so happends that some applications
> operate on URL fragments. JCIFS interprets such URLs as a directory.

I don't see what makes it a "fragment".  The term "fragment" is used in
the RFCs.  It's the syntax token delmited by the "#" character (as in
"index.html#PART2" or somesuch).

> > I have a lot of comments inline below, but the results are...
> > 
> > 1) Note that the segment is made up of zero or more pchars followed by 
> >    zero or more (";" param)'s.  That means... a segment can be empty.
> >    The significance is that trailing slashes are permitted by the syntax.
> >    That's important, because (as we all know) they're used all over the 
> >    place.
> 
> Ok.
> 
> > 2) Web browsers commonly add a trailing slash after the authority section 
> >    or a directory name if there isn't one there already.  This is the 
> >    quirk we're trying to work out, I believe.  I don't really care whether
> >    
> >    jCIFS does this or makes it easy for the calling application to do
> >    this.  The point is, it needs to be done.  I *do* care that the user of
> >    
> >    an application built on jCIFS shouldn't have to do this manually
> >    (example programs don't count).
> 
> Negative. Web browsers do not add a trailing slash after anything. That is a
> URL given to it in response to an HTTP redirect response. The browser is
> simply displaying what it was given.

Okay...

(Well, no... I mean I'm the one who's supposed to be pedantic...)

> > 3) The point of adding the "missing" trailing slash appears to be twofold.
> >    First, it deals with an oddity and/or ambiguity in RFC2396.  The RFC 
> >    doesn't clearly explain how to deal with an authority section that
> >    doesn't have a trailing slash.  Also, the RFC is dealing with URI 
> >    strings in very general terms and doesn't distinguish between 
> >    directories and files, so if there is no trailing slash it instructs 
> >    us to remove the final segment.  That would give a very unexpected 
> >    result, so that's not what we want to do.
> 
> Ok.
> 
> > > Web browsers
> > > will interpret an HTTP redirect and display the URL to which the browser
> > > has been redirected thus providing the appearance that a trailing slash
> > > has been "added".
> > 
> > Well, the trailing slash *was* added... by the server in this case.  It 
> > sends back the modified URI and the client retries without ever annoying 
> > the user.
> 
> Not really. The server said "Error: the page you requested has been
> perminently moved to <new URL>".

That's simply the protocol used to get the job done.

> > I don't know why the trailing slash is required by HTTP.  Perhaps it is
> > something in the HTTP specifications.  The SMB URI specifications are
> 
> Because if you didn't and you navigated to http://server/path/dir and
> clicked on a link <a href="page.html" you would get
> http://server/path/page.html whereas if you had http://server/path/dir/
> you'll get what you really want which is http://server/path/dir/page.html.

Given the formula in RFC 2396 for calculating new absolute URIs from a 
given base and a relative URI... yes.  You're correct.

> > built from the RFCs mentioned above, and those RFCs clearly show leading
> > slashes, not trailing slashes, in the grammar and in some of the examples.
> 
> Again, the grammer is only for complete URLs. This is the key thing to
> understand.

I don't understand.  I have no idea what you're talking about here.

> > does it.  I think the only area where Mike and I disagree on this is in 
> > how it gets handled by jCIFS.)
> 
> No, we disagree on this:
> 
> smb://[server/[share/[path/[file.txt]]]]
> vs
> smb://[server[/share[/path[/file.txt]]]]
> 
> which is what I thought we were *really* talking about. This representation
> (that we made up) is intuative but it is NOT a complete grammer and
> therefore cannot be compared to the 2396 grammer. It is a condensed bastard
> grammer that just shows optional parts of a URL that if left out yeild a
> parent fragment.

I don't know what a parent fragment freaking is.  Where is that defined in 
the RFCs?

If you don't like the shorthand, fine.  That's okay.  Use the real 
grammar.  The real grammar in the RFCs show leading slashes.

> > > and the
> > > client cannot unconditionally mask the redirect response because the
> > > caller most likely needs to know where it has been redirected.
> > 
> > What, in this case, constitutes the caller?  Sorry, you were talking HTTP 
> > here and I got a little lost...
> 
> The caller would be the web browser. I was just pointing out that the HTTP
> client cannot transparently reinitate the GET request to the new URL. That's
> basically what you are suggesting I do with jCIFS and I want to make it
> clear that HTTP does not exhibit that behavior.

If I type:

$ lynx http://jcifs.samba.org/src

Lynx will tell me that it's using http://jcifs.samba.org/src/
...but otherwise, from the end-user perspective, it's transparent.  
Mozilla doesn't tell me what it's doing.  The new URI simply shows up in 
the Location window.

> > > > That in mind, I still feel that jCIFS could easily handle the semantic
> > > > issues just as it handles the semantic differences between a server
> > > > and a workgroup.
> > > 
> > > Actually the more I think about this it's totally impractical to do this
> > > with JCIFS because URLs are immutable and are not resolved until they
> > > are used.
> > 
> > One part that I was missing here, and have since figured out, is that 
> > you're not even going over the wire in some cases.  For example, if I try 
> > to .list() on an SmbFile that is based on an SMB URI that has no trailing 
> > slash... then I get the exception before any network activity occurs at 
> > all.
> 
> Well that's because I know list() only applies to a directory so it's a good
> place to check for the '/'. If you try to do exists() it's not possible to
> tell without going to the wire.

I'll have to play with that.

> > Makes sense.  If you assign semantic meaning to a trailing slash then the
> > lack of the trailing slash would indicate a file, not a directory (or
> > would indicate "ambiguous").  The .list() method isn't defined for a 
> > non-directory, so I see why it throws an exception.
> 
> But again the exception is not thrown universally. Which is a bug in itself.
> It should either be thrown consistantly or not at all. Unfortunately neither
> of those cases is acceptible.

I think it's worth getting this straight, whichever way it gets handled.

> > Perhaps the most important thing to do is to explain to coders using jCIFS
> > exactly what is going on and why jCIFS behaves as it does.  To anyone 
> > who's used a web browser for more than a week, modifications to the URI in
> > their Location bar seem natural.  It's a surprise to get an error message 
> > that tells you you've got to be picky about things like adding a trailing 
> > slash.
> 
> JCIFS is a low-level client library. A low level HTTP client library would
> behave the same (not automatically "add a trailing slash").

I can accept that.

> > > is a non-standard permutation of command line option definitions that
> > > was made up by someone during the original SMB URL discussion on
> > > samba.technical. It is not a real grammer and cannot be compared to the
> > > said section in 2396.
> > 
> > Well, it can be.  It's a common short-hand, that's all.  In any case, 
> > there is an RFC 2396-compliant grammar provided in the current Internet 
> > Draft, and that grammar follows the convention and uses leading slashes, 
> > not following slashes.
> 
> The RFC 2396 grammer is correct. But it only applies to complete URLs that
> refer to leaf nodes. Parent nodes (addressed by a parent fragment of the
> complete URL) do not apply. So the trailing slash is optional in which case
> both:
> 
> smb://[server/[share/[path/[file.txt]]]]
> and
> smb://[server[/share[/path[/file.txt]]]]
> 
> are legal but which one makes more sense? Would you rather encourage users
> to write:
> 
> smb://server/share/path
> or
> smb://server/share/path/

I'm not planning on encouraging users to one or the other.  What this 
conversation has shown me is:

  - both are syntactically correct.
  - neither are valid until they reach the server.

It's the server that knows the real semantics then that's the place to ask 
the question.

> >   smb:/foo.bar.biz/share  +  path/to/file
> > 
> > That makes it a little clearer.  The expected result is:
> >   smb://foo.bar.biz/share/path/to/file
> > 
> > What you actually get (per RFC 2396's algorithms) is:
> >   smb://foo.bar.biz/path/to/file
> 
> Right. And this is exactly what jcifs (actually the java.net.URL class)
> would do.
> 
> > So...   (Hoping you've stuck with me this far without blowing a gasket...)
> 
> I'm not mad I'm just pulling my hair out trying to get you to see that our
> goofy condensed syntactopath cannot be compared to the "path_segments =
> segment *( "/" segment )" part in 2396.

So work with the real grammar.  The shorthand is just that.  The thing is, 
it does an okay job of presenting a general idea...

I will think about changing it for the magazine article I'm doing.
I didn't think that was the point of the discussion, however.

> > ...and we reach agreement.  Kewl.
> 
> Phew.

Yeah.

Chris -)-----

-- 
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org


More information about the jcifs mailing list