[Samba] frequent tdb corruption

Tue May 14 06:16:47 MDT 2013

On 05/14/2013 05:59 AM, Adam Thorn wrote:
> Hi,
>
> I'm seeing regular tdb corruption; typical log messages are:
>
> tdb(/var/db/samba/sessionid.tdb): tdb_rec_read bad magic 0x42424242 at
> offset=672032
>
> tdb(/var/db/samba/connections.tdb): tdb_rec_read bad magic 0x0 at
> offset=1111638594
>
> tdb(/var/db/samba/locking.tdb): tdb_rec_read bad magic 0x42424242 at
> offset=1034396
>
> which then prevents fileserving from working properly (N.B. the bad
> magic is not limited to those three tdbs). At the moment I'm running
> Samba 3.6.6 on FreeBSD 9.0, but I've seen exactly the same behaviour
> with 3.6.9 and 3.6.13, and also the same behaviour on FreeBSD 9.1 as
> well. I also currently have the tdb-1.2.9,1 FreeBSD port installed at
> present, but have seen the same problem with tdb-1.2.11,1
>
> I found a few forum posts that suggested setting "use mmap=no" - I have
> tried that, but saw no change in behaviour.
>
> Restarting samba invariably clears the problem for a while: sometimes
> it's just a few hours before we get further bad magic messages,
> sometimes it's continued working fine for ~10 days or so, and pretty
> much everything in between. There is no obvious pattern of which tdbs
> are corrupting; I've seen pretty much all of them become corrupt over
> the last couple of months.
>
> The server has multiple IP addresses which samba listens on; first of
> all we just start smbd with 
>
> [global]
>    include = /data/config/samba/servers/%i
>
> and I've attached the result from running testparm on one of those
> included files. It's very very slightly redacted to hide IP addresses
> and group names. We have another similarly-configured server (FreeBSD
> 9.0, Samba 3.6.6) with the same pattern of "include a config file
> dependent on the IP address the client connects to", and that has been
> running smoothly with no problems at all for over a year.
>
> I don't think (but have not absolutely conclusively ruled out) that it's
> a hardware problem on the server itself; the samba service (and the
> associated IP addresses) is managed by heartbeat, so I've tried running
> samba on the two nominally identical servers in the HA cluster - I see
> the same problematic behaviour on both nodes. 
>
> I've also attached the output of "smbd -b", in case that is informative.
>
> I'm kind of running out of ideas of what to try next; any and all advice
> will thus be gratefully received! It's been especially hard to diagnose
> because the corruption happens seemingly at random, and I've not been
> able to identify a definite action that leads to the errors. (Also,
> because it's a production server, I'm not keen to try to deliberately
> provoke errors..)
>
> Adam
>
>

What type of filesystem are you using?  Do you have barriers enabled?

I know in Linux that you should set barrier=1 on the ext3/ext4 filesystem in order to prevent corruption of sam.ldb in
cases of power loss.