[jcifs] Re: directory must end with '/'

Thu Dec 9 06:43:37 GMT 2004

On Wed, Dec 08, 2004 at 06:09:16PM -0500, Michael B Allen wrote:
> [I'm getting the list on this as it clearly explains the position on this
> matter.]

Well...  I've spent a lot of time this evening reading through Mike's 
message, reading the RFCs, and doing packet captures to test my theories.

The first thing I need to say is that it's all very interesting.  There's 
a certain amount of ambiguity and possibly some bugs in RFC 2396.  This 
kind of research is all fun for me, and I'm enjoying it.  I also found 
that there were some things missing from my SMB Internet Draft.  Very 
helpful!

The next thing to say is that what Mike wrote isn't "the position", it's
Mike's position.  I respect that, and not just because Mike is in charge
of the code.

The third thing is that, after going through and answering eveything Mike 
said (and correcting myself a few times in the process) I think we've 
basically reached the same point albeit from two completely opposite 
directions.  Hurrah!

Finally, this is really a practice vs. theory vs. practice kind of
discussion.  I know I'm right :) but that doesn't make anyone wrong.

I like working with good people.

So...

The syntax of the SMB URI, as described in the current Internet Draft,
matches the grammar given in RFC 2396 and the supplement RFC 2732 (which
describes how to use IPv6 addresses with URIs).

  (There is one small exception, which is that the "smb://" form isn't
  covered in RFC 2396.  I sat down to dinner with one of the authors of
  RFC 2396 and he said that the "smb://" exception was okay.  Of course,
  we were having Thai food with a bunch of other geeks so everyone was in
  a good mood and I suppose that could have influenced his opinion.)

The RFC's grammar includes the following:

  absoluteURI   = scheme ":" ( hier_part | opaque_part )
  hier_part     = ( net_path | abs_path ) [ "?" query ]
  net_path      = "//" authority [ abs_path ]
  abs_path      = "/"  path_segments
  path_segments = segment *( "/" segment )
  segment       = *pchar *( ";" param )

The thing to notice (the point that got this conversation going) is that 
the slashes are at the *front* of the authority and of each segment.  
Syntactically, that means that neither the authority section (the 
user at hostname:port part) nor the path segments (the directory names) 
require a trailing slash.

I have a lot of comments inline below, but the results are...

1) Note that the segment is made up of zero or more pchars followed by 
   zero or more (";" param)'s.  That means... a segment can be empty.
   The significance is that trailing slashes are permitted by the syntax.
   That's important, because (as we all know) they're used all over the 
   place.

2) Web browsers commonly add a trailing slash after the authority section 
   or a directory name if there isn't one there already.  This is the 
   quirk we're trying to work out, I believe.  I don't really care whether 
   jCIFS does this or makes it easy for the calling application to do
   this.  The point is, it needs to be done.  I *do* care that the user of 
   an application built on jCIFS shouldn't have to do this manually
   (example programs don't count).

3) The point of adding the "missing" trailing slash appears to be twofold.
   First, it deals with an oddity and/or ambiguity in RFC2396.  The RFC 
   doesn't clearly explain how to deal with an authority section that
   doesn't have a trailing slash.  Also, the RFC is dealing with URI 
   strings in very general terms and doesn't distinguish between 
   directories and files, so if there is no trailing slash it instructs 
   us to remove the final segment.  That would give a very unexpected 
   result, so that's not what we want to do.  Adding the slash, sort of 
   like web browser do for HTTP, is really the best way to handle the 
   problem.

I think my summary is less lucid than the inline comments...  As you read
those, however, keep in mind that I was learning new things as I wrote
them.

> Christopher R. Hertel said:
> >> Think about HTTP. HTTP servers universally reject requests for
> >> directories that don't have a trailing slash.
> >
> > ...but HTTP clients don't.
> 
> HTTP clients do NOT add a trailing slash if it's missing.

Neither do they reject the request.  From the user's perspecive the
request was accepted and the desired result was returned.

> Web browsers
> will interpret an HTTP redirect and display the URL to which the browser
> has been redirected thus providing the appearance that a trailing slash
> has been "added".

Well, the trailing slash *was* added... by the server in this case.  It 
sends back the modified URI and the client retries without ever annoying 
the user.

I don't know why the trailing slash is required by HTTP.  Perhaps it is
something in the HTTP specifications.  The SMB URI specifications are
built from the RFCs mentioned above, and those RFCs clearly show leading
slashes, not trailing slashes, in the grammar and in some of the examples.

I should note that the examples in RFC 2396 place *no* semantic meaning on 
path elements.  None.  In Appendix C, for instance, they manipulate path 
elements without regard to any sort of interpretation.  As I think about 
it, I believe that this may be why the HTTP scheme likes to add the 
trailing slash.  It does what Mike is trying to do with the SMB scheme, 
which is to add a syntactic mechanism to identify semantic elements.
(Note that this may be a perfectly valid thing to do.  Obviously, HTTP 
does it.  I think the only area where Mike and I disagree on this is in 
how it gets handled by jCIFS.)

Now, as I mentioned, I've talked to some of the folks who wrote those
RFCs.  They probably wouldn't tell you that their RFCs are perfect.

> There's no way for an HTTP client to know that a URL refers to a directory
> without querying the server (and even then it still doesn't necessarily
> know as it is entirely up to the server to send an HTTP redirect)

Right.  It's an indirect query.  The server handles rewriting the URI and 
sending back the modified version which the client then dutifully resends.  
It would work just as well to return a different error code, give the 
modified URI, and also provide the desired page all in one go.  That would 
avoid an extra round-trip and still signal the client that the URI had 
been modified.

jCIFS also has to query the server in order to determine the type of a
URI-identified object.  In the CIFS case, however, the query is direct and 
the client must take corrective action.

Note:  If that means that someone needs to catch the exception, rewrite 
       the URI, and resend it... well, that's okay by me.  It may be that 
       it's the application I write using jCIFS, rather than jCIFS itself.
       I suppose that's part of the debate.

> and the
> client cannot unconditionally mask the redirect response because the
> caller most likely needs to know where it has been redirected.

What, in this case, constitutes the caller?  Sorry, you were talking HTTP 
here and I got a little lost...

> At least CIFS has the ability to directly query a path to determine if
> it's a directory. HTTP has no ability to query a path to determine if it's
> a directory and therefore it's not possible for an HTTP client to add a
> trailing slash based on such information.

As discussed above...

> > That in mind, I still feel that jCIFS could easily handle the semantic
> > issues just as it handles the semantic differences between a server and a
> > workgroup.
> 
> Actually the more I think about this it's totally impractical to do this
> with JCIFS because URLs are immutable and are not resolved until they are
> used.

One part that I was missing here, and have since figured out, is that 
you're not even going over the wire in some cases.  For example, if I try 
to .list() on an SmbFile that is based on an SMB URI that has no trailing 
slash... then I get the exception before any network activity occurs at 
all.

Makes sense.  If you assign semantic meaning to a trailing slash then the
lack of the trailing slash would indicate a file, not a directory (or
would indicate "ambiguous").  The .list() method isn't defined for a 
non-directory, so I see why it throws an exception.

I can think of a few options that would avoid the exception.  I am not 
recommending any of them, per se., just thinking about what's possible.

You could:
- Define the .list() method for a file.  Then you'd have to connect to the
  server and query the object type to figure out which SMBs to send in 
  order to return a listing.  (In my bash shell, I can do "ls ." and "ls 
  file.name", and both work.)
- You can assume that it's a directory, add the UNC slash, and send the 
  same query without changing the URI string.
- You can assume that it's a directory, add the UNC slash, and send the 
  same query, then modify the URI to match.
- You can do as you currently do and throw an exception.

Perhaps the most important thing to do is to explain to coders using jCIFS 
exactly what is going on and why jCIFS behaves as it does.  To anyone 
who's used a web browser for more than a week, modifications to the URI in 
their Location bar seem natural.  It's a surprise to get an error message 
that tells you you've got to be picky about things like adding a trailing 
slash.

> The client would have query the server for every URL to determine if
> it is a directory and if so check that it has a "/" and if it doesn't it
> would have to throw an exception that get's trapped very high up in the
> call so it can reinitialize the request from scratch. It would be very
> very messy and probably not possible based on the way SmbFiles are
> constructed (can't do super(try {} catch { redo }) inside a constructor).

I am not sure what you're trying to do here, since I don't think you'd 
have to do a query for every URI.  You'd only have to query for those that 
are ambiguous and (perhaps) half the time you would simply get the 
information the user expected.

> > In any case, the syntax I present in the article more or less matches the
> > syntax presented in RFC 2396 so I don't think there's any reason to change
> > it.  I can provide examples that have a trailing slash and I can
> 
> The 2936 grammer is for complete URLs, not parent fragments. How parent
> fragments are to be interpreted is not defined by that grammer.

I have never heard of a "parent fragment", so I don't know what you mean.  

> Our
> condensed syntax URL:
> 
>   smb://[server/[path/[file]]]

No, it's smb://[server[/share[/path][/file]]

Worth noting:  RFC 2396 allows for empty path segments.  I hadn't noticed 
that before.  With that in mind,

  smb://foo/bar
and
  smb://foo/bar/

are equally valid, syntactically speaking.  It would be up to the specific 
scheme definition to assign different semantic meanings to them.  The 
current SMB URI draft doesn't specify anything for this.  Clearly an 
oversight on my part.  I'll have to think about that.  If 'bar' is a 
directory, the I would definitely want the user agent to handle it as 
such.  Likewise if it's a file.

I should emphasize, though, that the current behavior of jCIFS isn't a 
problem in that regard.  The application calling the jCIFS library would 
need to handle the exceptions to make things pretty for the user.  Just to 
repeat myself, I'm only suggesting that jCIFS could be doing some of this 
internally.

> is a non-standard permutation of command line option definitions that was
> made up by someone during the original SMB URL discussion on
> samba.technical. It is not a real grammer and cannot be compared to the
> said section in 2396.

Well, it can be.  It's a common short-hand, that's all.  In any case, 
there is an RFC 2396-compliant grammar provided in the current Internet 
Draft, and that grammar follows the convention and uses leading slashes, 
not following slashes.

> > (and probably will) mention that jCIFS will throw a catchable exception if
> > the object turns out to be a directory.  (I think jCIFS is fine if there's
> > no trailing slash on a host piece, eg. "smb://foo.bar.biz".)
> 
> Actually I don't think jCIFS universally throws the "directory must end
> with '/'" exception. It only does it where practical as a warning.

Well, what I discovered when working with Ethereal and the .list() method 
was that jCIFS doesn't go to the wire on this one.  It basically says "I 
can't be sure this is a directory so I'll throw an exception".  Then the 
program exits.

I've figured out why you're doing this, and this discussion has helped me 
understand some very twiddly things about URI semantics.  I'm still 
holding to the idea that there's no reason that the trailing slash 
absolutely has to be there.  More in a moment.

> The real problems with no trailing slash start when you combine SmbFiles
> with relative URLs like:
> 
> SmbFile f = new SmbFile( "smb://foo.bar.biz" );
> SmbFile f2 = new SmbFile( f, "path/to/file" ); /* bad */
> or
> SmbFIle files[] = f.listFiles(); /* bad */

I went through the RFCs carefully on this one.  "smb://foo.bar.biz" is a 
perfectly valid absolute URI, and "path/to/file" is a perfectly valid 
relative URI.  That said, the descriptions given (in a couple of places) 
in the RFC do not provide any good way to merge them... basically because 
the missing slash is, er, missing.

You can't put the missing slash at the front of "path/to/file" because 
then it would be an abs_path.  In this particular example that would work, 
so I'll have to use a different example in a second.

You can't put the missing slash at the end of "smb://foo.bar.biz" because
"smb://foo.bar.biz" is perfectly valid without it, so there's no real
reason to add it there.

Let's try another example:

  smb:/foo.bar.biz/share  +  path/to/file

That makes it a little clearer.  The expected result is:
  smb://foo.bar.biz/share/path/to/file

What you actually get (per RFC 2396's algorithms) is:
  smb://foo.bar.biz/path/to/file

So...   (Hoping you've stuck with me this far without blowing a gasket...)

I think (once again) that this is why HTTP creates the semantic
distinction and adds the empty path segment (that is, the trailing slash)
to a directory name.

> The correct way for a UI to support automatically appending a trailing
> slash is to query the path when a user enters or modifies a URL and add it
> if necessary. And it should only be necessary when a user actually enters
> or modifies the URL. As they drill down and navigate around it should not
> be necessary to check again as getName() returns a name with a trailing
> slash if it is a directory.

Right.  That makes perfect sense (that is, the code ensures that it is
creating and maintaining a syntactically and semantically correct URI
string).

> Actually the simple way to do this is to have the UI just check to make
> sure that the URL in the address bar at the time the user triggers a
> request matches what it displayed. If it doesn't then the path needs to be
> checked again.

...and vice versa.  If the UI needs to change the URI (eg., by adding that 
extra slash) then the location bar should be updated.

> For example in a file manager with an address bar the initial URL might be
> "blank:". If someone then types in "smb://server" that doesn't match
> "blank:" so the UI checks for the trailing slash which doesn't exist so it
> query's the path to determine if it's a directory and if so add a slash.

Well, yes... except that the syntax specifies that the above form would
parse into a scheme and an authority.  It's safe to add the trailing slash
to the authority, so in that sense you're correct.

> Now you have "smb://server/".

Okay.

> If the drills down to
> "smb://server/path/dir/" by clicking on links at no point should it be
> necessary to check these paths. If the user then enters
> "smb://othersvr/path/to/file" that does not match what the UI previously
> displayed (smb://server/path/dir/) so it check the path, finds it's not a
> directory and does nothing.

I guess what you're saying here is what the application that uses jCIFS is 
the correct place for this stuff to happen.  I can live with that.

> In truth to handle java.net.URL objects generically in some kind of file
> manager or web browser we might have to change something to make it a
> little easier for the UI developer. For example the "directory must end
> with '/'" MalformedURLException would probably be a nuisance. They should
> be able to suppress that so they can do the isDirectory() call in peace.

...and we reach agreement.  Kewl.

-- 
"Implementing CIFS - the Common Internet FileSystem" ISBN: 013047116X
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org