Christopher R. Hertel
crh at nts.umn.edu
Wed Jul 12 03:49:36 GMT 2000
My SGI went belly-up this morning following a normal reboot. I had to run
out and get another disk and tomorrow I get to re-install IRIX. I'm not
sure that the disk is completely bad (I can get to the volume header),
it's just that XFS is panicing because it can't read the root block.
Anyway... I missed Peter's message, to which Elrond replied. Bits of my
message are embedded.
> On Tue, Jul 11, 2000 at 10:54:17PM +1000, Peter Samuelson wrote:
> > * After a wins server is pronounced dead, it is not queried again for
> > (say) five minutes. Easy enough, put an `isdead' boolean and a
> > `lasttried' time_t in the `struct winsserver' or whatever.
> That sounds like a nice version, see below for more
That's what one would do for 'failover'.
> > * Round-robin: rotate your in-core list of wins servers, putting the
> > dead server on the end of the queue. It won't be primary server
> > again until all the other servers crash at least once.
That, again, is what I was aiming for with failover. The assumption,
though is that all of the WINS servers are in the list.
What I have coded so far is fairly simple. There's a "live" list and a
"dead" list. When a WINS server does not respond at all, it is moved to
the dead list and the next server in the live list is tried. This
repeats until the live list is exhausted. If this happens, then the
particular name query or registration will fail.
One modification to this is to always maintain a full 'live' list and
simply move 'dead' entries to the end. That's probably what I'll wind up
doing. The key is to avoid an infinite loop checking all WINS servers
Note that a reload of smb.conf will reset the live list to the contents
of 'wins server'.
> > I would consider either approach acceptible. The latter does have
> > implications for namespaces, in that it becomes impossible to
> > prioritize the wins servers (i.e. specify whose answer you would prefer
> > to get) but would have the advantage of a simpler implementation.
If the servers are synced, then there is no need to prioritize them.
I encourage everyone to read:
> prioritizing doesn't work here any more, as you already
> wrote, this version would make sense for the "complex
> solution" below.
> > The only disadvantage of doing this is a bit of network traffic every
> > time you need to send an update. How often is this? My impression is
> > that the added traffic would be modest.
Traffic to the WINS server would include both registrations (and updates)
and queries. I've already impoved the traffic problems by adding code so
that Samba doesn't retry a WINS query if the WINS server returned "no".
When a WINS server fails, then queries or registrations will retry--three
times, I think. If there's no response then I'll mark the server dead
and try the next server.
So, that's three UDP packets per dead WINS server. Not bad.
> > > ..but... The other way to do this is to think of it as a way to join
> > > multiple name spaces. Doing so seems perfectly sane to me.
> > Me too.
> > In fact, I don't even think there's that much room for debate.
I brought it up because there really is room for debate. Failover and
membership in multiple *separate* WINS spaces are two very different
things. Failover assumes that all of the WINS servers are sync'd.
Multi-membership assumes the opposite.
For multi-membership to work, the client station must register in *each*
of the WINS namespaces. Also, there must be a hierarchy to the WINS
namespaces so that duplicate names can be resolved in a known fashion.
If I'm searching for node "SERVER", and there is a node "SERVER" listed
in two WINS spaces, how do I know which to choose?
Given that NetBIOS names are the addresses in the NetBIOS virtual LAN,
there are a lot of problems that could be caused.
All of the Windows systems with which I've worked assume failover. Note
that I haven't worked with W/98.
> > Obviously, if the administrator has specified the multiple servers in
> > smb.conf, it means he considers the Samba server to be part of the
> > namespace of each wins server. So why not make it explicit and just
> > register with everyone every time?
That's not obvious. It may be that the administrator wanted failover.
I think that registering with each WINS server would still work (though
I'm not sure), because the registration will match the existing entry so
it will be taken as an update.
> This all sounds quite good to me.
I have my doubts. That's why I'm talking about it.
> That is:
> - Register with all specified wins-servers
Could generate a bit of replication traffic. I really don't know what
will happen if this is done.
> side-question: what happens, if it fails?
If the servers are synced the it won't matter, on the server side. I
have no idea what the Samba client would or should do.
> If the server is down, retry later, but not too early,
> we don't want to flood the net.
Do we retry now?
> If the name is already taken? No idea yet.
This can happen. Of course, this is more likely with multi-membership.
> - For queries, use them all in _order_:
> - If we get a negative response, try the next one.
> - if we get no response, mark the machine as dead for a
> while and try it later (as noted above by Peter)
> - of course, skip servers, marked as dead
> - (unless their deadtime is over)
> I can see only one bad thing: Increaed network-traffic,
> especialy with registrations.
...and replication traffic. If WINS-A replicates with WINS-B and WINS-B
replicates with WINS-C and you register with all of them, then they will
all be sending update messages to each other. Urq. The traffic probably
won't be that bad, but geez what a mess. Oooh, and I haven't considered
the implications for Browsing yet.
> > > wins server = 192.168.101.5, wins.office.com:wins2.office.com
> > >
> > > The above basically says
> > > - Use 192.168.101.5 to resolve names via WINS.
> > > - If name resolution from 192.168.101.5 fails, then try the WINS
> > > server at office.com.
> > > - Initially, wins.office.com is the second WINS server to try. If it
> > > appears to be down, then use wins2.office.com instead (that is,
> > > failover).
> > To me this is needlessly complex (and as you said "a pain in the
> > tailfeathers to implement"). My humble opinion is that nobody needs
> > this amount of flexibility.
> > No need to pick & choose which feature you want. I can't see a
> > situation where it would make sense to have one but not the other.
Ah, but that *is* the problem! If I implement failover without
multi-membership, then you can still point to two WINS servers but the
second one will only receive queries and registrations if the first one
fails. Thus, multi-membership doesn't work.
If I implement multi-membership without failover, then Samba winds up
registering multiple times in the same WINS namespace, and queries that
get a 'name not found' are repeated to sync'd WINS servers from which
they get the same answer.
The suggestion I made above specifies multiple WINS name spaces and
provides for failover within those. This is a much better way to handle
the combination of the two features.
> Yes, I also think, that this amount of flexibility isn't
> realy necessary. I just can't see something, were this is
> better than the approach above (except for network-traffic,
> granted that.)
Does my approach above make more sense now? I'm not against the
multi-membership idea, though I have no idea what it will do to browse
lists, or finding the DMB or PDC. A lot of that is up to the Admin, who
will have to ensure that there are no name collisions between WINS
servers. Of course, if you're doing that, why not just sync them anyway?
> > > PS. I've actually gotten most of failover written up. It's a bit
> > > different than the patch that was submitted but that's because there
> > > were requests for more than two WINS servers. Also, I've made a
> > > distinction between Negative Query Response and a timeout. That fix
> > > should improve speed even when only one WINS server is in use.
> > Cool! You're saying the current code basically drops the NQR on the
> > floor and continues to wait for timeout?
> > If so, it explains why failed
> > lookups to WINS seem to take just as long as failed lookups to
> > broadcast, something that has mildly bothered me for awhile now (mildly
> > meaning not enough to make me actually investigate the code or the
> > protocol).
Yes. I only noticed it because I was working on installing the patch and
I wanted to be able to distinguish between "the answer is NRQ" and "he's
> Hmmm... That explains something.
> Okay... dropped that also into TNG.
I'm going to check in a slightly better version of this fix. Better
debugging and it catches more return codes. Look for it.
Christopher R. Hertel -)----- University of Minnesota
crh at nts.umn.edu Networking and Telecommunications Services
Ideals are like stars; you will not succeed in touching them
with your hands...you choose them as your guides, and following
them you will reach your destiny. --Carl Schultz
More information about the samba-technical