parse_dos_attribute_blob() inconsistent file_id through make_file_id_from_itime()

Ralph Boehme slow at samba.org
Sun Dec 15 12:29:33 UTC 2019


Am 12/14/19 um 12:25 PM schrieb Krasimir Ganchev:
> On Saturday, December 14, 2019 2:23 AM, Ralph Boehme wrote:
> 
>> Am 12/14/19 um 6:11 AM schrieb Krasimir Ganchev:
>>> On Friday, December 13, 2019 12:59 AM, Ralph Boehme wrote:
>>>> this is a recent Samba change to overcome the problematic use of inode numbers for the file-ids. The assumption was that with nsec filesystem timestamp granularity, the itime, which stands for invented time and is basically an immutable birthtime, will alway be unique.
>>>>
>>> Can you share a network trace of a minimal reproducer?
>>>
>>> I am attaching a minimal packet capture at the time the directory was browsed.
> 
>> sorry if that wasn't clear, but we also need the capture to cover when the files were created.
> 
> What more information would that give us? Apparently we are dealing with local issue on the samba side. It's 8MB file, I can provide a download link if needed, but I don't feel like blowing such an attachment here.

Just share a link where I can download the traces.

>> What's strange is that the capture shows that for files where the file-id is the same, the creation-dates are different, so I would expect the itime to be different as well which should result in unique file-ids.
> 
> If am not mistaken, I think there is still no consensus on implementing creation time on Linux, although some FS actually support it.

there's now statx() and eg btrfs supports it, but this is currently not
used in Samba.

> All you have is the change time. Creation time is copied over from Windows as a DOS attribute which in this case is irrelevant as invented time (itime) is based on the change time supplied by the FS (creation time is only used if nothing is supplied by the FS).

I was suspecting a bug in Samba where possibly if the client sets the
creation date this would somehow munge itime and file-id.

To give some context: it was me who added added the feature of basing
the file-id on a immutable itime a few months ago.

> If you look closely in the text file in the zip you will find the output of stat on each of the files and you can clearly see that itime was indeed based on the granular change time of the FS. The problem here is as I mentioned in my initial post when copying files in parallel there are definitely batches of files that get the same change time in the FS (actually maybe I was not clear enough in my initial post mentioning the creation time but I was referring to the creation time in the local FS).

Yeah, this is likely the problem. I wonder how the files end up with
identical ctime, as in Samba the file creation will be serialized so
they *can't* happen at the same time.

>> Go figure, we definitely need a network trace with *parallel* log level 10. smb.conf as well.
> 
> The problem with providing log level 10 debug is that the moment I enable level 10 debug for the particular client, due to the fact that copying becomes much slower I am unable to reproduce the issue as it takes more time to commit the files and they get different change time and itime respectively.

Drat, I see. You could try and use the ringbuffer for logging. Given a
large enough buffer, maybe the full log fits into it so you can extract
it after running the reproducer:

  logging = ringbuf:size=NBYTES

Then

  # smbcontrol PID ringbuf-log

This won't work with more then 128 MB (iirc).

> Looking at the code and the data I supplied so far (samba-tool, stat, extracts from debug) I feel we can clearly pinpoint this to generating fileids off the invented time XATTR_DOSINFO_ITIME.

As the itime is based on the ctime, the question is, how can the files
end up with identical ctime? As mentioned above, file creation in Samba
is serialized, so there's no parallelism.

>>> You can clearly see that the fileids returned are the same for files with the same timestamp (find response packet 53 from the capture).
>>>
>>> Within the attachment there is also a text file containing some part of the client debug log, stat of the shared directory contents, and samba-tool ntacl getdosinfo of each file in the directory.
> 
>> Unfortunately the output is only with seconds granularity.
> 
> This again is irrelevant because you are referring to the XATTR_DOSINFO_CREATE_TIME which is intentionally carried over as DOS attribute while copying the files.

If you look closely, the itime is there as well.

> In our case itime was not based of this DOS attribute (see the output of "samba-tool ntacl getdosinfo") in the previously attached ZIP.

For file handles referring to files just created via that handle, the
itime is taken from the ctime (lacking native btime support) and stored
in the DOS attribute xattr. Subsequently opens of the same file will use
the itime from the DOS attrubte. This itime value in the DOS attribute
is immutable once set, in contrast to the creation-date which can be
modified on client request.

>>> You mentioned the change was needed due to problematic use of inode numbers. I suppose this might be an issue with filesystems without inode table. Can you please elaborate?
> 
>> inode numbers are reused by the kernel, so they're not unique identifiers.
> 
> Correct me if I am wrong, but my understanding is that inodes are only reused after the last link to a particular inode is deleted?

Yup.

> I think this complies with the requirements for fileid in the Microsoft's whitepaper e.g. "FileId for a file MUST persist for the lifetime of a file on a given object store. A FileId MUST NOT be changed when a file is renamed. When the file is deleted, the FileId MAY be reused." 

Yes, this is sufficient for WIndows clients, unfortunately other client
assume no reuse, name the macOS client.

> If that's the case, maybe some combination of itime + inode could be a better approach except that it would still cause issues with filesystems that don't use inodes. 
> 
> I think the right thing to do here is to find a good method of generating unique fileids.

Samba has long been just using the inode number which worked fine, it's
just the macOS client that uses the server provided file-id as CNID wich
requires no number reuse.

-slow

-- 
Ralph Boehme, Samba Team                https://samba.org/
Samba Developer, SerNet GmbH   https://sernet.de/en/samba/
GPG-Fingerprint   FAE2C6088A24252051C559E4AA1E9B7126399E46



More information about the samba-technical mailing list