[jcifs] SMB URL

Wed Jul 10 08:55:48 EST 2002

> -----Original Message-----
> From:	Christopher R. Hertel [SMTP:crh at ubiqx.mn.org]
> Sent:	Tuesday, July 09, 2002 2:46 PM
> To:	Michael B. Allen
> Cc:	jcifs at samba.org
> Subject:	Re: [jcifs] SMB URL
> 
> On Tue, Jul 09, 2002 at 03:42:29AM -0400, Michael B. Allen wrote:
> :
> :
> > I  think you would just need to maintain the original URL (with the escape)
> > and use individual decoded fields for logical operations.
> 
> I don't think so, but I haven't written code to do this yet so I'm not 
> going to swear by it.  My thinking, though, is that you need to know the 
> rules for each syntax element.  Kind of like reverse-'lex'.
> 
> > But this requires
> > a  concerted  effort  from all parts of your program; no method or function
> > can export anything but the original URL or it may not accurately represent
> > the  resource  anymore.
> 
> In object terms, I am thinking of each possible field (username, IP
> address, NetBIOS name, ScopeID, DNS name, etc.) being an object.  Perhaps
> URLField or somesuch.  Each URLField object would contain, internally, the
> unescaped representation of the original.  It would also contain (point
> to?) a rule object that would indicate which characters must be escaped.  
> For speed, perhaps the escaped version would also be stored.
> 
> When the URL is parsed, it would be parsed into a set of URLField objects.  
> It should be possible, from that, to grab the un-escaped strings as 
> needed, and also to rebuild the SMB URL (maybe even a cleaned-up version) 
> if that is needed.
> 
> > In  theory  it's probably possible but in practice
> > scope  just  isn't  worth it. Ultimately factoring scope into the authority
> > component  is  indeed more complex than not factoring it in.
> 
> It's simply a question of parsing it out.  It doesn't really matter if 
> it's in the authority or in the query, it still needs to be parsed out and 
> it still needs to obey the applicable escaping rules.
> 
> Again, I've not written this up yet so I'm talking theory rather than 
> practice.  I'll have to give it a try.
> 
> > Maybe it won't
> > have  an  effect  on lookups but you've just pushed the complexity into the
> > URL  serialization  routines.  Multiplexing  that  many types of names (nbt
> > server,  nbt wg, nbt server w/scope, nbt workgroup w/scope, ip, and dns) in
> > one  field  is just bad programming. Implementors will not get it right and
> > SMB URLs just won't work well with scope. 
> 
> The parsing of the URL--the syntactic part--has to happen first.  Some of
> that is easy, since IPv4 and IPv6 addresses can be identified
> syntactically.  (IPv6 addresses are contained within square brackets,
> which must otherwise be escaped.  For IPv4... I think we should *not* try
> to interpret a string such as "192.168.101.14" as a NetBIOS Name with
> scopeID or a qualified DNS name.)
> 
> The problem the overloading causes is that there is little syntactic
> difference between the two name types.  That is: "foo" could be a DNS host
> name or a NetBIOS name.  Likewise, "foo.bar.edu" could be a DNS name or a
> NetBIOS name with scope ID.  The distinction is a semantic one that can
> only be resolved by sending network queries.  
> 
> Okay, so I understand that bit of it.  What I don't understand is how that
> pushes complexity out into the semantic resolution.
> 
	That's not what I said. You're pushing the complexity into the *serialization
	routines*. The problem is that your escaping technique is not robust to
	frequent reserialization. If you export an unescaped server name it may not
	represent the target anymore. Implementors would have to take great care not
	to export an unescaped URL *ever*. This has nothing to do with specific fields
	or how you parse the URL.

	Mike