[Samba] tdb_expand overflow detected

Dan Langille dan at langille.org
Thu Nov 21 22:27:10 UTC 2024


On Thu, Nov 21, 2024, at 4:32 PM, Douglas Bagnall wrote:
> On 21/11/24 15:24, Dan Langille wrote:
>>>>>      net cache list
>>>>>
>>>>> will tell you what the cache thinks it contains. If it is filled with
>>>>> real things, it could indicate where they're coming from. If it fails or
>>>>> shows a cache full of nonsense, well that is also interesting.
>>>>
>>>> That is 161 lines of expired stuff.
>>>
>>> Yeah, I'm not sure how that adds to 4 billion.
>>>
>>> tdbtool /var/db/samba4/gencache.tdb
>>> tdb> info
>>>
>>> will show lines describing the "smallest/average/largest" of various things.
>> 
>> This is the file I moved away:
>> 
>> [2:23 tm dvl ~] % sudo tdbtool ~/tmp/gencache.tdb
>> tdb> info
>> Size of file/data: 81919/9152
>> Header offset/logical size: 81920/4294967295
>
> It looks like tdb internally has a different idea of the file size than 
> the file system has. From an earlier message:
>
>>>> The file size is close
>>>> 
>>>> [22:44 tm dvl ~] % ls -l /var/db/samba4/gencache.tdb
>>>> -rw-r--r--  1 root wheel 4295049215 2024.11.18 13:26 /var/db/samba4/gencache.tdb
>>>> 
>>>> 4295049215-4294967295 = 81,920
>
> 81920 is 0x14000.
> 4294967295 is 0xffffffff.
> the actual file size is 0x100013fff.
>
> My understanding of tdb (I am not an expert) is that it can only map in 
> a 32 bit size (up to 4294967295), so the extra stuff at the end is not 
> actually accessible.
>
> What I think has happened is mmap() or something has somehow set an 
> extra bit, so the desired file size of 0x13fff becomes 0x100013fff, 
> after which tdb is in a state of confusion, refusing to add anything.
>
>> Number of records: 161
>> Incompatible hash: yes
>> Active/supported feature flags: 0x00000001/0x00000001
>> Robust mutexes locking: yes
>> Smallest/average/largest keys: 19/39/61
>> Smallest/average/largest data: 16/17/59
>> Smallest/average/largest padding: 20/20/26
>> Number of dead records: 0
>> Smallest/average/largest dead records: 0/0/0
>> Number of free records: 22
>> Smallest/average/largest free records: 28/195223196/4294897995
>
> Of course this largest free record looks a bit large, but I am guessing 
> this is an artifact rather than a cause.
>
> The original message said:
>
>> I'm using samba416-4.16.11 on FreeBSD 14.1 (on ZFS, in a jail, with quotas on those filesystems, etc)
>
> This is a sparsely populated region. 

Meaning not many people do this crazy stuff? 

> Maybe there is something in that "etc" that might affect it?

I was so tired when I read this: at first I thought you meant /etc or /usr/local/etc ...

The following is relative to the jail.  Every path you see mentioned is actually at /jails/tm
relative to the host.  However, in a jail, the code just thinks it's in a host.

samba is writing to /usr/local/timemachine

[21:53 tm dvl /usr/local/timemachine] % ls -l
total 98
drwxr-xr-x  3 dvl-air01 dvl-air01 4 2024.11.21 16:54 dvl-air01/
drwxr-xr-x  3 dvl-pro02 dvl-pro02 5 2024.11.21 20:38 dvl-pro02/
drwxr-xr-x  3 dvl-pro03 dvl-pro03 3 2024.11.19 16:19 dvl-pro03/
drwxr-xr-x  3 dvl-pro04 dvl-pro04 4 2024.11.21 21:42 dvl-pro04/


The jail has no concept of ZFS or quotas or compression done by ZFS

That's all in the host:

[21:55 r730-03 dvl ~] % zfs list -r data01/timemachine
NAME                               USED  AVAIL  REFER  MOUNTPOINT
data01/timemachine                3.71T  7.88T   112K  /jails/tm/usr/local/timemachine
data01/timemachine/dvl-air01       348G  7.88T   297G  /jails/tm/usr/local/timemachine/dvl-air01
data01/timemachine/dvl-air01-old   878G   146G   878G  none
data01/timemachine/dvl-pro02       652G   848G   429G  /jails/tm/usr/local/timemachine/dvl-pro02
data01/timemachine/dvl-pro03       660G   364G   492G  /jails/tm/usr/local/timemachine/dvl-pro03
data01/timemachine/dvl-pro04      1.24T  7.88T  1.02T  /jails/tm/usr/local/timemachine/dvl-pro04


[21:55 r730-03 dvl ~] % zfs get -r -t filesystem quota data01/timemachine
NAME                              PROPERTY  VALUE  SOURCE
data01/timemachine                quota     none   default
data01/timemachine/dvl-air01      quota     none   default
data01/timemachine/dvl-air01-old  quota     1T     local
data01/timemachine/dvl-pro02      quota     1.46T  local
data01/timemachine/dvl-pro03      quota     1T     local
data01/timemachine/dvl-pro04      quota     none   default

There are four clients, each with their own backup directory (and separate zfs filesystem
which samba doesn't know about).

Perhaps something did get messed up.  I have a suspect.

Backups were failing for the air01 client.  I rolled back the filesystem to an old snapshot.
Backups then resumed with success.

The jail wasn't running, so samba wasn't active. However the cache was not affected/touched
by this rollback, only data01/timemachine/dvl-air01 

Looking at the not-yet-published blog post, the rollback happened on Nov 3.

I posted here Nov 13.

Looking at backups of the logs, I think the 'overflow detected' messages start on Nov 11 09:55:21 

-- 
  Dan Langille
  dan at langille.org



More information about the samba mailing list