tdb file corruption when fsync/fdatasync fails

Rungta, Vandana vrungta at amazon.com
Mon Sep 17 19:32:20 UTC 2018


Hello,

If there is a underlying filesystem glitch/failure when we call smbpasswd  it can result in a corrupted secrets.tdb file.

Samba – 4.8.4
smbpasswd -ad <user>   (Called immediately after the samba rpm is installed)


Aug 29 09:35:02

 tdb(/usr/local/samba/private/secrets.tdb): tdb_transaction: fsync failed

  tdb(/usr/local/samba/private/secrets.tdb): tdb_transaction_prepare_commit: failed to setup          recovery data

  PANIC (pid 11112): could not start commit secrets db

  BACKTRACE: 7 stack frames:

   #0 /usr/local/samba/lib/libsamba-util.so.0(log_stack_trace+0x1c) [0x7f9df20f929c]

   #1 /usr/local/samba/lib/libsmbconf.so.0(smb_panic_s3+0x2d) [0x7f9df1e9075d]

   #2 /usr/local/samba/lib/libsamba-util.so.0(smb_panic+0x3a) [0x7f9df20f93ca]

   #3 /usr/local/samba/lib/private/libsecrets3-samba4.so(get_global_sam_sid+0x589) [0x7f9decdb0199]

   #4 /usr/local/samba/bin/smbpasswd(main+0x60b) [0x5579db0761db]

   #5 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f9de50d1445]

   #6 /usr/local/samba/bin/smbpasswd(+0x3199) [0x5579db075199]



Aug 29 09:35:02 localhost kernel: [ 4798.314228] EXT4-fs (sda1): Delayed block allocation failed for inode 1323116 at logical offset 4 with max blocks 2 with error 121

Aug 29 09:35:02 localhost kernel: [ 4798.315778] EXT4-fs (sda1): This should not happen!! Data will be lost



Later calls to smbpasswd succeed and I can call smbpasswd -ad <user> successfully without a panic.

But examining the private/secrets.tdb with tdbtool the integrity check fails.  Even though the check fails, we can continue to successfully use smbpasswd



Unfortunately I cannot reproduce this issue, but I do have a couple of different corrupted secrets.tdb which I can provide if needed.



Given the un-reproducibility of the issue,  I am not sure if the above information is enough for debugging.  Looking at transaction_setup_recovery and _tdb_transaction_prepare_commit, is there additional cleanup needed in either _tdb_transaction_cancel or before calling the cancel since the setup recovery code does multiple writes and syncs and depending on which sync failed maybe some part of the recovery data is written, resulting in the corrupted secrets.tdb file.



Thanks,

Vandana Rungta

vrungta at amazon.com

1)

tdbtool secrets.tdb.corrupt1

tdb> check

Dead space at 696-1363968 (of 1363968)

Hashes do not match records

Integrity check for the opened database failed.

tdb> list

hash=17

 rec: hash=17 offset=0x00083f68 next=0x00000000 rec_len=128 key_len=31 data_len=68 full_hash=0x6bf7425c magic=0x26011999

hash=40

 rec: hash=40 offset=0x00083ee8 next=0x00000000 rec_len=104 key_len=12 data_len=68 full_hash=0x2a4c7c2e magic=0x26011999

freelist:

hash=-1

 rec: hash=-1 offset=0x00004000 next=0x00069000 rec_len=0 key_len=0 data_len=0 full_hash=0x00000000 magic=0xd9fee666

ERROR: tailer does not match record! tailer=3657360998 totalsize=24

 rec: hash=-1 offset=0x00069000 next=0x00000000 rec_len=110288 key_len=0 data_len=0 full_hash=0x00000000 magic=0xd9fee666



2)

tdbtool secrets.tdb.corrupt2

tdb> check

Dead space at 696-1363968 (of 1363968)

Hashes do not match records

Integrity check for the opened database failed.

tdb> list

hash=8

 rec: hash=8 offset=0x00083758 next=0x00000000 rec_len=56 key_len=26 data_len=16 full_hash=0xbbdfcbb9 magic=0x26011999

hash=14

 rec: hash=14 offset=0x00083b70 next=0x00000000 rec_len=876 key_len=35 data_len=662 full_hash=0xb6de2886 magic=0x26011999

hash=21

 rec: hash=21 offset=0x00083298 next=0x00000000 rec_len=128 key_len=46 data_len=52 full_hash=0x964be636 magic=0x26011999

hash=33

 rec: hash=33 offset=0x00083f74 next=0x00000000 rec_len=116 key_len=21 data_len=68 full_hash=0xbcf76893 magic=0x26011999

hash=40

 rec: hash=40 offset=0x00083ef4 next=0x00000000 rec_len=104 key_len=12 data_len=68 full_hash=0x2a4c7c2e magic=0x26011999

hash=61

 rec: hash=61 offset=0x0008388c next=0x00000000 rec_len=64 key_len=43 data_len=4 full_hash=0x9ee5af09 magic=0x26011999

hash=62

 rec: hash=62 offset=0x00083330 next=0x00000000 rec_len=120 key_len=24 data_len=68 full_hash=0x92dc93c2 magic=0x26011999

hash=70

 rec: hash=70 offset=0x00083834 next=0x00000000 rec_len=64 key_len=43 data_len=4 full_hash=0x1702d869 magic=0x26011999

hash=80

 rec: hash=80 offset=0x000838e4 next=0x00000000 rec_len=628 key_len=40 data_len=458 full_hash=0x46361a32 magic=0x26011999

hash=86

 rec: hash=86 offset=0x000837a8 next=0x00000000 rec_len=116 key_len=22 data_len=68 full_hash=0xe3708a40 magic=0x26011999

hash=102

 rec: hash=102 offset=0x00083710 next=0x00000000 rec_len=48 key_len=30 data_len=5 full_hash=0x0dbe25de magic=0x26011999

hash=113

 rec: hash=113 offset=0x000825cc next=0x00000000 rec_len=3252 key_len=38 data_len=2560 full_hash=0x93a91770 magic=0x26011999

freelist:

hash=-1

 rec: hash=-1 offset=0x000833c0 next=0x00004000 rec_len=824 key_len=22 data_len=68 full_hash=0xe3708a40 magic=0xd9fee666

 rec: hash=-1 offset=0x00004000 next=0x00069000 rec_len=0 key_len=0 data_len=0 full_hash=0x00000000 magic=0xd9fee666

ERROR: tailer does not match record! tailer=3657360998 totalsize=24

 rec: hash=-1 offset=0x00069000 next=0x00000000 rec_len=103860 key_len=0 data_len=0 full_hash=0x00000000 magic=0xd9fee666










More information about the samba-technical mailing list