[Samba] Samba AD/DC crashed again, third time in as many months

IT Admin it at cliffbells.com
Mon Mar 7 00:09:15 UTC 2016


Hi Andrew,

Thanks for your input.  I intend to take a look at the memory on this
machine asap to see if that is the cause of my issues, and I figure I may
as well swap out the data cables while I'm there for good measure.

I didn't think package conflicts could be an issue as I spent a fair amount
of time double checking for conflicts when I initially moved from
canonical's distribution of samba to a compiled build as the domain had
been functional for awhile.

My only frustration at this point is that at no point does the wiki state
that in order to stably deploy active directory using samba one MUST deploy
at least to ADCs.  If that is true it would be helpful to state the
requirement in the docs.  I've got a few ADs deployed with single ADCs, now
I feel compelled to make those environments more robust and in many cases
lack the resources to do so without added cost for the client and/or a fair
amount of labor on my part.

I'll update this thread once I've had an opportunity to inspect the
hardware.

Kind regards,

JS
On Mar 6, 2016 6:56 PM, "Andrew Bartlett" <abartlet at samba.org> wrote:

> On Wed, 2016-03-02 at 16:42 -0500, IT Admin wrote:
> > I built this machine, and while it isn't the most robust box in the
> > world
> > it has been stable otherwise.  The RAID array is configured RAID1, I
> > can't
> > see how that could cause corruption issues and I haven't experienced
> > any
> > other data corruption issues apart from SAMBA collapsing
>
> I know it is hard to swallow, but I really think this is hardware, or
> the OS configuration under it, combined with unexpected shutdown or
> some other corruption vector.
>
> We have at this stage 10,000 or more domains running Samba4, and this
> is only the second I've heard of with this kind of symptom.  The first
> I blamed on a use of DRDB that I postulated was not preserving 'write
> barriers' (that is, the thing that makes fsync() work) and a poweroff,
> but I didn't really have any proof.
>
> You do need to run a second DC, as well as run tools like memcheck on
> this DC.  Make sure you regularly run the backup script, so you can
> work out when the corruption happens, and verify your DB with dbcheck.
>
> The second DC has the advantage that this kind of low-level corruption
> doesn't easily spread across DRS replication (it would instead fail
> replication).
>
> The error shown indicates that for some reason or other, it can't read
> the schema.  This is very odd, as the schema doesn't change!
>
> We would love to get to the bottom of this.
>
> Unlike others, I don't think this has anything to do with packaging
> (that would just make us not start at all), but a clean install on a
> clean machine is my best advise, keeping the rest aside (and off) for
> forensics if you have the patience.
>
> Finally, always keep the steps simple - otherwise we might start
> confusing admin errors for hardware errors or vice verca.  The things
> we all do in the panic are always the hardest to de-construct in the
> cold light of day.
>
> Thanks,
>
> Andrew Bartlett
>
> --
> Andrew Bartlett
> https://samba.org/~abartlet/
> Authentication Developer, Samba Team         https://samba.org
> Samba Development and Support, Catalyst IT
> https://catalyst.net.nz/services/samba
>
>
>
>
>
>
>


More information about the samba mailing list