[Samba] samba in a High Availability Configuration

Martin Pool mbp at samba.org
Thu Feb 20 01:06:40 GMT 2003


On 19 Feb 2003, Matt Schillinger <mschilli at vss.fsi.com> wrote:
> The results I get now (on a Windows NT4 machine) is:
> 
> 1. start a copy from a local drive to a samba served drive.
> 
> 2. failover the samba server to the secondary.
> 
> 3. the copy seems to stall.
> 
> 4. As the secondary server comes online (or the IP comes online), the
> copy issues an error.
> 	I don't know if the error is due to server state, or that the IP comes
> up for a second with no samba server bound to the interface. This is why
> i am interested in seeing if bind interfaces only option can be
> accomplished without actually having the IP aliases bound, so that the
> samba server can already be listening for the interfaces when the
> aliases come up.
>
> 5. Immediately starting the copy over (from the secondary server serving
> data) works fine.. no reconnects required.

"bind interfaces" is the least of your problems.

I don't think you said which IP HA system you were using.  If it's not
one that preserves TCP connections in a failover, then this is the
first problem: the Windows client will think it still has a TCP
connection open, but the newly active server doesn't know about it.

If you do get past that problem then there is all kinds of information
about the connection stored inside the smbd process that got killed
off that is necessary to continue serving the connection.

I think this is basically as good as it gets at the moment.  (I think,
but I'm not sure, that the Microsoft solution has about the same
limitations.) 

So you need to either

 - just put up with restarting the transfer when the server fails over

 - do lower-level HA (in the OS or hardware) that preserves the smbd
   process in the case of a failover 

 - put a *lot* of time and/or money into supporting this in Samba

Even if you did externalize all the connection state so that it could
be migrated I'm not sure that this would actually improve RAS.
Presumably you're failing over because something has gone wrong, and
tight coupling might propagate the problem.

-- 
Martin 


More information about the samba mailing list