Patch to add support for advertising FULLSYNC to Mac OSX Clients

Sun Nov 27 02:40:29 UTC 2016

On Mon, Nov 21, 2016 at 05:00:26PM +0100, Ralph Böhme wrote:
Hi Ralph,
   No worries. I understand that people get busy with other things.

> 
> maybe we may want to call a spade a space and name this option
> "fruit:time machine". Thoughts?
> 

I am fine with this. I contemplated that name as well but wasn't sure
if advertising the FULLSYNC capability had a use case outside of supporting
Time Machine. Either name works for me though.

> > diff --git a/libcli/smb/smb2_create_ctx.h b/libcli/smb/smb2_create_ctx.h
> > index cb194f5..1c65e6c 100644
> > --- a/libcli/smb/smb2_create_ctx.h
> > +++ b/libcli/smb/smb2_create_ctx.h
> > @@ -30,7 +30,7 @@
> >  
> >  /* "AAPL" Server Query request/response bitmap */
> >  #define SMB2_CRTCTX_AAPL_SERVER_CAPS 1
> > -#define SMB2_CRTCTX_AAPL_VOLUME_CAPS 2
> > +#define SMB2_CRTCTX_AAPL_VOLUME_CAPS 6
> >  #define SMB2_CRTCTX_AAPL_MODEL_INFO  4
> 
> This definitely looks wrong, these are the defines for individual bits
> in a bitfield. Why are you changing SMB2_CRTCTX_AAPL_VOLUME_CAPS to 6 ?
> 

You are correct, that shouldn't be there. I mistakenly thought that needed to
be bumped for the correct response to be sent to clients.

> I've also added code that ensures all prerequisite Samba options are
> set on the fly when a Time Machine enabled share is connected.
> 
> Now, secondly, the interesting part: have you ever tested if the TM
> disk image filesystem survives network disconnects and/or hard server
> power offs ?
> 

I have been running the provided patch set for the past month and have not
noticed any issues. In that time I have restarted the networking interfaces
on the server I am using while backups are running without any issues being
reported as well as being able to restore from the same backup. With that 
being said I have not tested a hard poweroff of the server as it is backed
by an UPS. I will try to test this case and report back.

> I've been told that there seems to be an issue in the Linux kernel not
> properly flushing buffers to disk in an fsync() resulting in damaged
> TM disk image filesytems. This was discovered by folks running tests
> with a similar patch.
> 

I am by no means an expert here but I think the success of fsync() may
depend on write barrier support in the underlying file system. I think
in kernels after 2.6.30 and at least ext4, this should be improved
according to these:

https://wiki.archlinux.org/index.php/ext4#Barriers_and_performance
https://lwn.net/Articles/283161/

> From hearsay, some storage devices cheet when they get a flush
> write-buffer command and ignore it, but the testing was done with a
> storage device that was known not to cheet. But still, after power
> cycling the server while a TM backup was in progress the TM disk image
> filesystem was frequently reported as damaged by the client.

> Do we want to put our users at risk of loosing their backups in
> situations like this ? Do we want to pretend being a suitable backup
> target for something that breaks easily for unknown reasons ?

I can certainly understand the concern and I think it is valid. Re-reading
the Time Machine spec, the FULLSYNC capability is embedded in a SMB FLUSH
request. Also based one this email thread, Samba FLUSH operations are
asynchronous by default:

https://lists.samba.org/archive/samba/2008-September/143627.html

The asynchronous writes make me curious if this might be leading to
some of the corruption edge cases as well as the case above. Is it possible to
force a fsync() from the VFS layer? Could we add a handler for SMB2 FLUSH
commands that check for a Reserved1 Field set to 0xFFFF and force an fsync()?

> It seems just putting your laptop to sleep or disconnecting from
> network while TM is running seems to be the primary cause for this. To
> me it's entirely unclear how this relates to fsync implementation bugs,
> it might be unrelated.

I'm unclear how or if they are related as well. I could definitely see
potential corruption issues occuring from hard power off's but not necessarily
client disconnects or the client entering sleep mode. In the case of the
client disconnecting, the server should still be able to sync data in the cache
successfully. The client disconnect case I will try to test some more
as well but so far I haven't noticed any issues and I have pretty regularly
put my laptop to sleep while taking backups. 

Thanks,
Kevin