Patch to add support for advertising FULLSYNC to Mac OSX Clients

Ralph Böhme slow at samba.org
Sat Jan 21 08:13:59 UTC 2017


Hi Kevin,

On Sat, Nov 26, 2016 at 09:40:29PM -0500, Kevin Anderson wrote:
> On Mon, Nov 21, 2016 at 05:00:26PM +0100, Ralph Böhme wrote:
> Hi Ralph,
>    No worries. I understand that people get busy with other things.
> 
> > 
> > maybe we may want to call a spade a space and name this option
> > "fruit:time machine". Thoughts?
> > 
> 
> I am fine with this. I contemplated that name as well but wasn't sure
> if advertising the FULLSYNC capability had a use case outside of supporting
> Time Machine. Either name works for me though.

ok.

> > I've also added code that ensures all prerequisite Samba options are
> > set on the fly when a Time Machine enabled share is connected.
> > 
> > Now, secondly, the interesting part: have you ever tested if the TM
> > disk image filesystem survives network disconnects and/or hard server
> > power offs ?
> > 
> 
> I have been running the provided patch set for the past month and have not
> noticed any issues. In that time I have restarted the networking interfaces
> on the server I am using while backups are running without any issues being
> reported as well as being able to restore from the same backup. With that 
> being said I have not tested a hard poweroff of the server as it is backed
> by an UPS. I will try to test this case and report back.

did you run into any issues?

> > I've been told that there seems to be an issue in the Linux kernel not
> > properly flushing buffers to disk in an fsync() resulting in damaged
> > TM disk image filesytems. This was discovered by folks running tests
> > with a similar patch.
> > 
> 
> I am by no means an expert here but I think the success of fsync() may
> depend on write barrier support in the underlying file system. I think
> in kernels after 2.6.30 and at least ext4, this should be improved
> according to these:
> 
> https://wiki.archlinux.org/index.php/ext4#Barriers_and_performance
> https://lwn.net/Articles/283161/

That were my findings as well.

> > From hearsay, some storage devices cheet when they get a flush
> > write-buffer command and ignore it, but the testing was done with a
> > storage device that was known not to cheet. But still, after power
> > cycling the server while a TM backup was in progress the TM disk image
> > filesystem was frequently reported as damaged by the client.
> 
> > Do we want to put our users at risk of loosing their backups in
> > situations like this ? Do we want to pretend being a suitable backup
> > target for something that breaks easily for unknown reasons ?
> 
> I can certainly understand the concern and I think it is valid. Re-reading
> the Time Machine spec, the FULLSYNC capability is embedded in a SMB FLUSH
> request.

yes.

> Also based one this email thread, Samba FLUSH operations are
> asynchronous by default:
> 
> https://lists.samba.org/archive/samba/2008-September/143627.html

yes, they are asynchronous *and* they're disabled by default (strict sync =
no), that's why we'ge going to enable it at runtime if fruit:time machine=yes.

> The asynchronous writes make me curious if this might be leading to
> some of the corruption edge cases as well as the case above.

Hm, I guess the time window is small where we responsed to the flush request
while the fsync is still being done in a worker thread, but it's there, so yes,
this could be possible.

> Is it possible to force a fsync() from the VFS layer? Could we add a handler
> for SMB2 FLUSH commands that check for a Reserved1 Field set to 0xFFFF and
> force an fsync()?

Yes, we probably want to parse the Reserved1 field in the SMB2 frontend and pass
it down to the SMB2 flush request handler. Depending on the setting we could the
switch between callinc async flush() or sync.
> 
> > It seems just putting your laptop to sleep or disconnecting from
> > network while TM is running seems to be the primary cause for this. To
> > me it's entirely unclear how this relates to fsync implementation bugs,
> > it might be unrelated.
> 
> I'm unclear how or if they are related as well. I could definitely see
> potential corruption issues occuring from hard power off's but not necessarily
> client disconnects or the client entering sleep mode. In the case of the
> client disconnecting, the server should still be able to sync data in the cache
> successfully. The client disconnect case I will try to test some more
> as well but so far I haven't noticed any issues and I have pretty regularly
> put my laptop to sleep while taking backups. 

ok.

Cheerio!
-slow



More information about the samba-technical mailing list