Multi-WINS behavior.

Tue Jul 11 12:54:15 GMT 2000

[Chris Hertel]
> I need some votes from folks regarding behavior of the WINS fail-over.

I don't think I'm a voting member here (: but I've got a few thoughts.

> The patch that was submitted queries the primary.  The primary may
> send an answer, it may send a Negative Query Response, or it may
> simply be dead by the side of the road attracting flies.  The
> behavior implemented in the patch is to try the secondary in either
> of the latter two cases.

Question.  In the third case, i.e. dead in the water, are you thinking
of continuing to query the primary server *every* time until it's back
up?  Because in that case you could potentially be doing a lot of
timing out.  I think one of the following would be better:

* After a wins server is pronounced dead, it is not queried again for
  (say) five minutes.  Easy enough, put an `isdead' boolean and a
  `lasttried' time_t in the `struct winsserver' or whatever.

-or-

* Round-robin: rotate your in-core list of wins servers, putting the
  dead server on the end of the queue.  It won't be primary server
  again until all the other servers crash at least once.

I would consider either approach acceptible.  The latter does have
implications for namespaces, in that it becomes impossible to
prioritize the wins servers (i.e. specify whose answer you would prefer
to get) but would have the advantage of a simpler implementation.

> However, if the WINS servers are not synced:
> - Then when you switch from primary to secondary you enter a new
>   namespace.  Your machine isn't registered in the new namespace,
>   so your name or the name of some service you rely upon may be
>   missing.  Worse, it may have been taken by some other machine.

I think it's best not to assume the two are synced, and to do our best
to sync them by hand (i.e. automatically send updates to all servers
every time).  That avoids the problem of having your name taken by
someone else while you weren't looking.  Well, at least it can give you 
an early warning if it happens.

The only disadvantage of doing this is a bit of network traffic every
time you need to send an update.  How often is this?  My impression is
that the added traffic would be modest.

> ..but... The other way to do this is to think of it as a way to join
> multiple name spaces.  Doing so seems perfectly sane to me.

Me too.

In fact, I don't even think there's that much room for debate.
Obviously, if the administrator has specified the multiple servers in
smb.conf, it means he considers the Samba server to be part of the
namespace of each wins server.  So why not make it explicit and just
register with everyone every time?

>   wins server = 192.168.101.5, wins.office.com:wins2.office.com
> 
> The above basically says
> - Use 192.168.101.5 to resolve names via WINS.
> - If name resolution from 192.168.101.5 fails, then try the WINS
>   server at office.com.
> - Initially, wins.office.com is the second WINS server to try.  If it
>   appears to be down, then use wins2.office.com instead (that is,
>   failover).

To me this is needlessly complex (and as you said "a pain in the
tailfeathers to implement").  My humble opinion is that nobody needs
this amount of flexibility.

As I said earlier, in my opinion you should get multiple namespaces by
registering with everyone all the time and getting a second opinion on
a negative query response.  Failover would be by one of the two methods
I described at the beginning of the post.

No need to pick & choose which feature you want.  I can't see a
situation where it would make sense to have one but not the other.

> PS.  I've actually gotten most of failover written up.  It's a bit
> different than the patch that was submitted but that's because there
> were requests for more than two WINS servers.  Also, I've made a
> distinction between Negative Query Response and a timeout.  That fix
> should improve speed even when only one WINS server is in use.

Cool!  You're saying the current code basically drops the NQR on the
floor and continues to wait for timeout?  If so, it explains why failed
lookups to WINS seem to take just as long as failed lookups to
broadcast, something that has mildly bothered me for awhile now (mildly
meaning not enough to make me actually investigate the code or the
protocol).

Peter