[jcifs] SMB URL

Wed Jul 10 04:46:10 EST 2002

On Tue, Jul 09, 2002 at 03:42:29AM -0400, Michael B. Allen wrote:
:
:
> I  think you would just need to maintain the original URL (with the escape)
> and use individual decoded fields for logical operations.

I don't think so, but I haven't written code to do this yet so I'm not 
going to swear by it.  My thinking, though, is that you need to know the 
rules for each syntax element.  Kind of like reverse-'lex'.

> But this requires
> a  concerted  effort  from all parts of your program; no method or function
> can export anything but the original URL or it may not accurately represent
> the  resource  anymore.

In object terms, I am thinking of each possible field (username, IP
address, NetBIOS name, ScopeID, DNS name, etc.) being an object.  Perhaps
URLField or somesuch.  Each URLField object would contain, internally, the
unescaped representation of the original.  It would also contain (point
to?) a rule object that would indicate which characters must be escaped.  
For speed, perhaps the escaped version would also be stored.

When the URL is parsed, it would be parsed into a set of URLField objects.  
It should be possible, from that, to grab the un-escaped strings as 
needed, and also to rebuild the SMB URL (maybe even a cleaned-up version) 
if that is needed.

> In  theory  it's probably possible but in practice
> scope  just  isn't  worth it. Ultimately factoring scope into the authority
> component  is  indeed more complex than not factoring it in.

It's simply a question of parsing it out.  It doesn't really matter if 
it's in the authority or in the query, it still needs to be parsed out and 
it still needs to obey the applicable escaping rules.

Again, I've not written this up yet so I'm talking theory rather than 
practice.  I'll have to give it a try.

> Maybe it won't
> have  an  effect  on lookups but you've just pushed the complexity into the
> URL  serialization  routines.  Multiplexing  that  many types of names (nbt
> server,  nbt wg, nbt server w/scope, nbt workgroup w/scope, ip, and dns) in
> one  field  is just bad programming. Implementors will not get it right and
> SMB URLs just won't work well with scope. 

The parsing of the URL--the syntactic part--has to happen first.  Some of
that is easy, since IPv4 and IPv6 addresses can be identified
syntactically.  (IPv6 addresses are contained within square brackets,
which must otherwise be escaped.  For IPv4... I think we should *not* try
to interpret a string such as "192.168.101.14" as a NetBIOS Name with
scopeID or a qualified DNS name.)

The problem the overloading causes is that there is little syntactic
difference between the two name types.  That is: "foo" could be a DNS host
name or a NetBIOS name.  Likewise, "foo.bar.edu" could be a DNS name or a
NetBIOS name with scope ID.  The distinction is a semantic one that can
only be resolved by sending network queries.  

Okay, so I understand that bit of it.  What I don't understand is how that
pushes complexity out into the semantic resolution.  Given a name such as
"foo.bar.edu", and considering that you want to send the queries in
parallel to save time, you would run three queries:

NetBIOS: FOO<20> in scope "bar.edu"
NetBIOS: FOO<1D> in scope "bar.edu"
    DNS: foo.bar.edu

(You might also do a query for "FOO<1B>", but since 1D names are never in
the NBNS (WINS) you could send the "FOO<1B>" query to the NBNS and the
"FOO<1D>" query could be broadcast, so it doesn't cost any more than a
single NetBIOS query operation.  Note that I assume a NetBIOS query
operation includes both the unicast and broadcast parts.)

If you had:

  smb://foo/?scope=bar.edu

You would still need to do these queries:

NetBIOS: FOO<20> in scope "bar.edu"
NetBIOS: FOO<1D> in scope "bar.edu"
    DNS: foo

...and someone could type in something nasty like:

  smb//foo.bar.edu/?scope=banana.tree

The above still has meaning.  It might mean that you want a DNS lookup on 
"foo.bar.edu", but connect using NBT within the scope "banana.tree".

Ick.

Okay, I've probably wandered way off course.  I need to write some code to 
try this out.  I have a half-working SMB URL parser in C.  I'll play with 
that.

Chris -)-----

-- 
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org