[Samba] tdb_expand overflow detected
Dan Langille
dan at langille.org
Thu Nov 21 22:27:10 UTC 2024
On Thu, Nov 21, 2024, at 4:32 PM, Douglas Bagnall wrote:
> On 21/11/24 15:24, Dan Langille wrote:
>>>>> net cache list
>>>>>
>>>>> will tell you what the cache thinks it contains. If it is filled with
>>>>> real things, it could indicate where they're coming from. If it fails or
>>>>> shows a cache full of nonsense, well that is also interesting.
>>>>
>>>> That is 161 lines of expired stuff.
>>>
>>> Yeah, I'm not sure how that adds to 4 billion.
>>>
>>> tdbtool /var/db/samba4/gencache.tdb
>>> tdb> info
>>>
>>> will show lines describing the "smallest/average/largest" of various things.
>>
>> This is the file I moved away:
>>
>> [2:23 tm dvl ~] % sudo tdbtool ~/tmp/gencache.tdb
>> tdb> info
>> Size of file/data: 81919/9152
>> Header offset/logical size: 81920/4294967295
>
> It looks like tdb internally has a different idea of the file size than
> the file system has. From an earlier message:
>
>>>> The file size is close
>>>>
>>>> [22:44 tm dvl ~] % ls -l /var/db/samba4/gencache.tdb
>>>> -rw-r--r-- 1 root wheel 4295049215 2024.11.18 13:26 /var/db/samba4/gencache.tdb
>>>>
>>>> 4295049215-4294967295 = 81,920
>
> 81920 is 0x14000.
> 4294967295 is 0xffffffff.
> the actual file size is 0x100013fff.
>
> My understanding of tdb (I am not an expert) is that it can only map in
> a 32 bit size (up to 4294967295), so the extra stuff at the end is not
> actually accessible.
>
> What I think has happened is mmap() or something has somehow set an
> extra bit, so the desired file size of 0x13fff becomes 0x100013fff,
> after which tdb is in a state of confusion, refusing to add anything.
>
>> Number of records: 161
>> Incompatible hash: yes
>> Active/supported feature flags: 0x00000001/0x00000001
>> Robust mutexes locking: yes
>> Smallest/average/largest keys: 19/39/61
>> Smallest/average/largest data: 16/17/59
>> Smallest/average/largest padding: 20/20/26
>> Number of dead records: 0
>> Smallest/average/largest dead records: 0/0/0
>> Number of free records: 22
>> Smallest/average/largest free records: 28/195223196/4294897995
>
> Of course this largest free record looks a bit large, but I am guessing
> this is an artifact rather than a cause.
>
> The original message said:
>
>> I'm using samba416-4.16.11 on FreeBSD 14.1 (on ZFS, in a jail, with quotas on those filesystems, etc)
>
> This is a sparsely populated region.
Meaning not many people do this crazy stuff?
> Maybe there is something in that "etc" that might affect it?
I was so tired when I read this: at first I thought you meant /etc or /usr/local/etc ...
The following is relative to the jail. Every path you see mentioned is actually at /jails/tm
relative to the host. However, in a jail, the code just thinks it's in a host.
samba is writing to /usr/local/timemachine
[21:53 tm dvl /usr/local/timemachine] % ls -l
total 98
drwxr-xr-x 3 dvl-air01 dvl-air01 4 2024.11.21 16:54 dvl-air01/
drwxr-xr-x 3 dvl-pro02 dvl-pro02 5 2024.11.21 20:38 dvl-pro02/
drwxr-xr-x 3 dvl-pro03 dvl-pro03 3 2024.11.19 16:19 dvl-pro03/
drwxr-xr-x 3 dvl-pro04 dvl-pro04 4 2024.11.21 21:42 dvl-pro04/
The jail has no concept of ZFS or quotas or compression done by ZFS
That's all in the host:
[21:55 r730-03 dvl ~] % zfs list -r data01/timemachine
NAME USED AVAIL REFER MOUNTPOINT
data01/timemachine 3.71T 7.88T 112K /jails/tm/usr/local/timemachine
data01/timemachine/dvl-air01 348G 7.88T 297G /jails/tm/usr/local/timemachine/dvl-air01
data01/timemachine/dvl-air01-old 878G 146G 878G none
data01/timemachine/dvl-pro02 652G 848G 429G /jails/tm/usr/local/timemachine/dvl-pro02
data01/timemachine/dvl-pro03 660G 364G 492G /jails/tm/usr/local/timemachine/dvl-pro03
data01/timemachine/dvl-pro04 1.24T 7.88T 1.02T /jails/tm/usr/local/timemachine/dvl-pro04
[21:55 r730-03 dvl ~] % zfs get -r -t filesystem quota data01/timemachine
NAME PROPERTY VALUE SOURCE
data01/timemachine quota none default
data01/timemachine/dvl-air01 quota none default
data01/timemachine/dvl-air01-old quota 1T local
data01/timemachine/dvl-pro02 quota 1.46T local
data01/timemachine/dvl-pro03 quota 1T local
data01/timemachine/dvl-pro04 quota none default
There are four clients, each with their own backup directory (and separate zfs filesystem
which samba doesn't know about).
Perhaps something did get messed up. I have a suspect.
Backups were failing for the air01 client. I rolled back the filesystem to an old snapshot.
Backups then resumed with success.
The jail wasn't running, so samba wasn't active. However the cache was not affected/touched
by this rollback, only data01/timemachine/dvl-air01
Looking at the not-yet-published blog post, the rollback happened on Nov 3.
I posted here Nov 13.
Looking at backups of the logs, I think the 'overflow detected' messages start on Nov 11 09:55:21
--
Dan Langille
dan at langille.org
More information about the samba
mailing list