parse_dos_attribute_blob() inconsistent file_id through make_file_id_from_itime()
ganchev at fixity.net
Sat Dec 14 11:25:18 UTC 2019
On Saturday, December 14, 2019 2:23 AM, Ralph Boehme wrote:
> Am 12/14/19 um 6:11 AM schrieb Krasimir Ganchev:
> > On Friday, December 13, 2019 12:59 AM, Ralph Boehme wrote:
> >> this is a recent Samba change to overcome the problematic use of inode numbers for the file-ids. The assumption was that with nsec filesystem timestamp granularity, the itime, which stands for invented time and is basically an immutable birthtime, will alway be unique.
>> Can you share a network trace of a minimal reproducer?
> > I am attaching a minimal packet capture at the time the directory was browsed.
> sorry if that wasn't clear, but we also need the capture to cover when the files were created.
What more information would that give us? Apparently we are dealing with local issue on the samba side. It's 8MB file, I can provide a download link if needed, but I don't feel like blowing such an attachment here.
> What's strange is that the capture shows that for files where the file-id is the same, the creation-dates are different, so I would expect the itime to be different as well which should result in unique file-ids.
If am not mistaken, I think there is still no consensus on implementing creation time on Linux, although some FS actually support it. All you have is the change time. Creation time is copied over from Windows as a DOS attribute which in this case is irrelevant as invented time (itime) is based on the change time supplied by the FS (creation time is only used if nothing is supplied by the FS).
If you look closely in the text file in the zip you will find the output of stat on each of the files and you can clearly see that itime was indeed based on the granular change time of the FS. The problem here is as I mentioned in my initial post when copying files in parallel there are definitely batches of files that get the same change time in the FS (actually maybe I was not clear enough in my initial post mentioning the creation time but I was referring to the creation time in the local FS).
> Go figure, we definitely need a network trace with *parallel* log level 10. smb.conf as well.
The problem with providing log level 10 debug is that the moment I enable level 10 debug for the particular client, due to the fact that copying becomes much slower I am unable to reproduce the issue as it takes more time to commit the files and they get different change time and itime respectively. Looking at the code and the data I supplied so far (samba-tool, stat, extracts from debug) I feel we can clearly pinpoint this to generating fileids off the invented time XATTR_DOSINFO_ITIME.
> > You can clearly see that the fileids returned are the same for files with the same timestamp (find response packet 53 from the capture).
> > Within the attachment there is also a text file containing some part of the client debug log, stat of the shared directory contents, and samba-tool ntacl getdosinfo of each file in the directory.
> Unfortunately the output is only with seconds granularity.
This again is irrelevant because you are referring to the XATTR_DOSINFO_CREATE_TIME which is intentionally carried over as DOS attribute while copying the files. In our case itime was not based of this DOS attribute (see the output of "samba-tool ntacl getdosinfo") in the previously attached ZIP.
> > You mentioned the change was needed due to problematic use of inode numbers. I suppose this might be an issue with filesystems without inode table. Can you please elaborate?
> inode numbers are reused by the kernel, so they're not unique identifiers.
Correct me if I am wrong, but my understanding is that inodes are only reused after the last link to a particular inode is deleted? I think this complies with the requirements for fileid in the Microsoft's whitepaper e.g. "FileId for a file MUST persist for the lifetime of a file on a given object store. A FileId MUST NOT be changed when a file is renamed. When the file is deleted, the FileId MAY be reused."
If that's the case, maybe some combination of itime + inode could be a better approach except that it would still cause issues with filesystems that don't use inodes.
I think the right thing to do here is to find a good method of generating unique fileids.
More information about the samba-technical