vfswrap_getxattrat_do_async and unshare(CLONE_FS)
Florian Weimer
fweimer at redhat.com
Fri Nov 4 07:30:59 UTC 2022
* Stefan Metzmacher via samba-technical:
> Hi Florian,
>
>> On Mon, Oct 24, 2022 at 11:46:22PM +0200, Florian Weimer via samba-technical wrote:
>>> As far as I can tell, vfswrap_getxattrat_do_async relies on
>>> unshare(CLONE_FS) to do fgetxattrat emulation with a thread-local fchdir
>>> and getxattr. There do not seem to be any other uses in the sources.
>>>
>>> I think the more usual way to emulate the missing fgetxattrat system
>>> call is to open the file with openat and O_PATH, and then use getxattr
>>> on the synthetic path under /proc/self/fd. While these paths present as
>>> symbolic links, they actually are not, so there is no race possible.
>>> (fgetxattr cannot operate on the open file descriptor directly.)
>>>
>>> Why wasn't the /proc-based approach chosen for Samba? It looks a bit
>>> simpler to implement, and does not do strange things to the process
>>> state behind glibc's back.
>> As I recall it was done that way as there are various OEMs with
>> fuse-based
>> filesystems that are very slow on fetching EA's, which we have to do
>> on every directory entry returned. Doing this inside a pthread using
>> unshare(CLONE_FS) to keep a per-thread directory was found to allow
>> these filesystems to have a decent performance.
>
> Yes, open/getxattr/close for each directory element was avoided
> for performance reasons at the time we developed it.
>
>> This was before the VFS-rewrite to use fd's everywhere as I recall,
>> so it's possible it may no longer be required. But I'd wait for
>> Metze to chime in here as he was deeply involved in this I think.
>
> Yes, looking the current code shows that
> vfswrap_getxattrat_do_async already uses vfswrap_fgetxattr(),
> which is fd-based and actually uses /proc/self/fd if
> fsp->fsp_flags.have_proc_fds is set, and that is never
> false if per_thread_cwd_supported() is true.
>
> So I think could replace per_thread_cwd_supported() in
> vfswrap_getxattrat_send() with fsp->fsp_flags.have_proc_fds
> and actually remove per_thread_cwd_activate() and set_thread_credentials()
> from vfswrap_getxattrat_do_async(). This would also mean we can
> also go async within restricted
> containers which block the whole unshare() syscall (independed of the passed flags).
Thanks for the explanation.
The reference to blocking unshare() is interesting, I had not considered
this. If we integrate CLONE_FS supported into glibc's pthread_create,
this issue goes away.
In the meantime, I'm going to ask other projects if they need
CLONE_FS. 8-)
Thanks,
Florian
More information about the samba-technical
mailing list