Patch to add support for advertising FULLSYNC to Mac OSX Clients

Kevin Anderson andersonkw2 at gmail.com
Sat Feb 18 23:32:00 UTC 2017


Hi Ralph,
   Sorry for the delay.

On Sat, Jan 21, 2017 at 3:13 AM, Ralph Böhme <slow at samba.org> wrote:
>
> Hi Kevin,
>
> On Sat, Nov 26, 2016 at 09:40:29PM -0500, Kevin Anderson wrote:
> > On Mon, Nov 21, 2016 at 05:00:26PM +0100, Ralph Böhme wrote:
> > Hi Ralph,
> >    No worries. I understand that people get busy with other things.
> >
> > >
> > > maybe we may want to call a spade a space and name this option
> > > "fruit:time machine". Thoughts?
> > >
> >
> > I am fine with this. I contemplated that name as well but wasn't sure
> > if advertising the FULLSYNC capability had a use case outside of supporting
> > Time Machine. Either name works for me though.
>
> ok.
>
> > > I've also added code that ensures all prerequisite Samba options are
> > > set on the fly when a Time Machine enabled share is connected.
> > >
> > > Now, secondly, the interesting part: have you ever tested if the TM
> > > disk image filesystem survives network disconnects and/or hard server
> > > power offs ?
> > >
> >
> > I have been running the provided patch set for the past month and have not
> > noticed any issues. In that time I have restarted the networking interfaces
> > on the server I am using while backups are running without any issues being
> > reported as well as being able to restore from the same backup. With that
> > being said I have not tested a hard poweroff of the server as it is backed
> > by an UPS. I will try to test this case and report back.
>
> did you run into any issues?
>

So far I have not run in to any issues even doing a hard power off a
couple of times.

> > > I've been told that there seems to be an issue in the Linux kernel not
> > > properly flushing buffers to disk in an fsync() resulting in damaged
> > > TM disk image filesytems. This was discovered by folks running tests
> > > with a similar patch.
> > >
> >
> > I am by no means an expert here but I think the success of fsync() may
> > depend on write barrier support in the underlying file system. I think
> > in kernels after 2.6.30 and at least ext4, this should be improved
> > according to these:
> >
> > https://wiki.archlinux.org/index.php/ext4#Barriers_and_performance
> > https://lwn.net/Articles/283161/
>
> That were my findings as well.
>
> > > From hearsay, some storage devices cheet when they get a flush
> > > write-buffer command and ignore it, but the testing was done with a
> > > storage device that was known not to cheet. But still, after power
> > > cycling the server while a TM backup was in progress the TM disk image
> > > filesystem was frequently reported as damaged by the client.
> >
> > > Do we want to put our users at risk of loosing their backups in
> > > situations like this ? Do we want to pretend being a suitable backup
> > > target for something that breaks easily for unknown reasons ?
> >
> > I can certainly understand the concern and I think it is valid. Re-reading
> > the Time Machine spec, the FULLSYNC capability is embedded in a SMB FLUSH
> > request.
>
> yes.
>
> > Also based one this email thread, Samba FLUSH operations are
> > asynchronous by default:
> >
> > https://lists.samba.org/archive/samba/2008-September/143627.html
>
> yes, they are asynchronous *and* they're disabled by default (strict sync =
> no), that's why we'ge going to enable it at runtime if fruit:time machine=yes.

OK.

>
> > The asynchronous writes make me curious if this might be leading to
> > some of the corruption edge cases as well as the case above.
>
> Hm, I guess the time window is small where we responsed to the flush request
> while the fsync is still being done in a worker thread, but it's there, so yes,
> this could be possible.
>
> > Is it possible to force a fsync() from the VFS layer? Could we add a handler
> > for SMB2 FLUSH commands that check for a Reserved1 Field set to 0xFFFF and
> > force an fsync()?
>
> Yes, we probably want to parse the Reserved1 field in the SMB2 frontend and pass
> it down to the SMB2 flush request handler. Depending on the setting we could the
> switch between callinc async flush() or sync.

OK. I will look to adding that to the provided patch but it may take
some time as I understand some other parts of the code base. I think
that would provide the necessary balance between data consistency and
performance.

>
> >
> > > It seems just putting your laptop to sleep or disconnecting from
> > > network while TM is running seems to be the primary cause for this. To
> > > me it's entirely unclear how this relates to fsync implementation bugs,
> > > it might be unrelated.
> >
> > I'm unclear how or if they are related as well. I could definitely see
> > potential corruption issues occuring from hard power off's but not necessarily
> > client disconnects or the client entering sleep mode. In the case of the
> > client disconnecting, the server should still be able to sync data in the cache
> > successfully. The client disconnect case I will try to test some more
> > as well but so far I haven't noticed any issues and I have pretty regularly
> > put my laptop to sleep while taking backups.
>
> ok.
>
> Cheerio!
> -slow

-Kevin



More information about the samba-technical mailing list