tdb file corruption when fsync/fdatasync fails
Rungta, Vandana
vrungta at amazon.com
Mon Sep 17 19:32:20 UTC 2018
Hello,
If there is a underlying filesystem glitch/failure when we call smbpasswd it can result in a corrupted secrets.tdb file.
Samba – 4.8.4
smbpasswd -ad <user> (Called immediately after the samba rpm is installed)
Aug 29 09:35:02
tdb(/usr/local/samba/private/secrets.tdb): tdb_transaction: fsync failed
tdb(/usr/local/samba/private/secrets.tdb): tdb_transaction_prepare_commit: failed to setup recovery data
PANIC (pid 11112): could not start commit secrets db
BACKTRACE: 7 stack frames:
#0 /usr/local/samba/lib/libsamba-util.so.0(log_stack_trace+0x1c) [0x7f9df20f929c]
#1 /usr/local/samba/lib/libsmbconf.so.0(smb_panic_s3+0x2d) [0x7f9df1e9075d]
#2 /usr/local/samba/lib/libsamba-util.so.0(smb_panic+0x3a) [0x7f9df20f93ca]
#3 /usr/local/samba/lib/private/libsecrets3-samba4.so(get_global_sam_sid+0x589) [0x7f9decdb0199]
#4 /usr/local/samba/bin/smbpasswd(main+0x60b) [0x5579db0761db]
#5 /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f9de50d1445]
#6 /usr/local/samba/bin/smbpasswd(+0x3199) [0x5579db075199]
Aug 29 09:35:02 localhost kernel: [ 4798.314228] EXT4-fs (sda1): Delayed block allocation failed for inode 1323116 at logical offset 4 with max blocks 2 with error 121
Aug 29 09:35:02 localhost kernel: [ 4798.315778] EXT4-fs (sda1): This should not happen!! Data will be lost
Later calls to smbpasswd succeed and I can call smbpasswd -ad <user> successfully without a panic.
But examining the private/secrets.tdb with tdbtool the integrity check fails. Even though the check fails, we can continue to successfully use smbpasswd
Unfortunately I cannot reproduce this issue, but I do have a couple of different corrupted secrets.tdb which I can provide if needed.
Given the un-reproducibility of the issue, I am not sure if the above information is enough for debugging. Looking at transaction_setup_recovery and _tdb_transaction_prepare_commit, is there additional cleanup needed in either _tdb_transaction_cancel or before calling the cancel since the setup recovery code does multiple writes and syncs and depending on which sync failed maybe some part of the recovery data is written, resulting in the corrupted secrets.tdb file.
Thanks,
Vandana Rungta
vrungta at amazon.com
1)
tdbtool secrets.tdb.corrupt1
tdb> check
Dead space at 696-1363968 (of 1363968)
Hashes do not match records
Integrity check for the opened database failed.
tdb> list
hash=17
rec: hash=17 offset=0x00083f68 next=0x00000000 rec_len=128 key_len=31 data_len=68 full_hash=0x6bf7425c magic=0x26011999
hash=40
rec: hash=40 offset=0x00083ee8 next=0x00000000 rec_len=104 key_len=12 data_len=68 full_hash=0x2a4c7c2e magic=0x26011999
freelist:
hash=-1
rec: hash=-1 offset=0x00004000 next=0x00069000 rec_len=0 key_len=0 data_len=0 full_hash=0x00000000 magic=0xd9fee666
ERROR: tailer does not match record! tailer=3657360998 totalsize=24
rec: hash=-1 offset=0x00069000 next=0x00000000 rec_len=110288 key_len=0 data_len=0 full_hash=0x00000000 magic=0xd9fee666
2)
tdbtool secrets.tdb.corrupt2
tdb> check
Dead space at 696-1363968 (of 1363968)
Hashes do not match records
Integrity check for the opened database failed.
tdb> list
hash=8
rec: hash=8 offset=0x00083758 next=0x00000000 rec_len=56 key_len=26 data_len=16 full_hash=0xbbdfcbb9 magic=0x26011999
hash=14
rec: hash=14 offset=0x00083b70 next=0x00000000 rec_len=876 key_len=35 data_len=662 full_hash=0xb6de2886 magic=0x26011999
hash=21
rec: hash=21 offset=0x00083298 next=0x00000000 rec_len=128 key_len=46 data_len=52 full_hash=0x964be636 magic=0x26011999
hash=33
rec: hash=33 offset=0x00083f74 next=0x00000000 rec_len=116 key_len=21 data_len=68 full_hash=0xbcf76893 magic=0x26011999
hash=40
rec: hash=40 offset=0x00083ef4 next=0x00000000 rec_len=104 key_len=12 data_len=68 full_hash=0x2a4c7c2e magic=0x26011999
hash=61
rec: hash=61 offset=0x0008388c next=0x00000000 rec_len=64 key_len=43 data_len=4 full_hash=0x9ee5af09 magic=0x26011999
hash=62
rec: hash=62 offset=0x00083330 next=0x00000000 rec_len=120 key_len=24 data_len=68 full_hash=0x92dc93c2 magic=0x26011999
hash=70
rec: hash=70 offset=0x00083834 next=0x00000000 rec_len=64 key_len=43 data_len=4 full_hash=0x1702d869 magic=0x26011999
hash=80
rec: hash=80 offset=0x000838e4 next=0x00000000 rec_len=628 key_len=40 data_len=458 full_hash=0x46361a32 magic=0x26011999
hash=86
rec: hash=86 offset=0x000837a8 next=0x00000000 rec_len=116 key_len=22 data_len=68 full_hash=0xe3708a40 magic=0x26011999
hash=102
rec: hash=102 offset=0x00083710 next=0x00000000 rec_len=48 key_len=30 data_len=5 full_hash=0x0dbe25de magic=0x26011999
hash=113
rec: hash=113 offset=0x000825cc next=0x00000000 rec_len=3252 key_len=38 data_len=2560 full_hash=0x93a91770 magic=0x26011999
freelist:
hash=-1
rec: hash=-1 offset=0x000833c0 next=0x00004000 rec_len=824 key_len=22 data_len=68 full_hash=0xe3708a40 magic=0xd9fee666
rec: hash=-1 offset=0x00004000 next=0x00069000 rec_len=0 key_len=0 data_len=0 full_hash=0x00000000 magic=0xd9fee666
ERROR: tailer does not match record! tailer=3657360998 totalsize=24
rec: hash=-1 offset=0x00069000 next=0x00000000 rec_len=103860 key_len=0 data_len=0 full_hash=0x00000000 magic=0xd9fee666
More information about the samba-technical
mailing list