A --exclude-checksum option?

Karl O. Pinc kop at meme.com
Tue Feb 12 14:07:57 MST 2013


On 02/12/2013 02:48:35 PM, Kevin Korb wrote:
> My first thought is why are you backing up /tmp at all?

Because I put stuff in /tmp I might want, and whatever
I put there goes away by itself.  It stays very handy for a while,
then it's on backup and less handy, then it's gone....

> My second thought is why are you using atime for anything?  It can be
> touched by almost anything and running a filesystem with atime 
> enabled
> is a huge performance detriment as it adds a directory write 
> operation
> to every file read operation.

Not my box, not my choice.  (I tend to like relatime....)

> 
> My final thought is maybe you want a file verification tool (I like
> cfv) instead of rsync --checksum.  Rsync's --checksum is kinda
> mindless in terms of performance.  It checksums everything.  This is
> rather pointless as a file that is a different size will obviously
> have a different checksum.  Rsync even checksums files that only 
> exist
> on one side of the transfer.

Wouldn't I have to do something with cfv as well so that checksums
only happens on files of different sizes?  Sounds like complication
when the backup is on a box reachable only via ssh.

I'm lazy, it was easy to incorporate verification into the
rsync backup process.  And checksumming everything means
that everything is verified -- may as well do it in rsync as
anywhere else.

It's a backup.  If it's corrupted then --checksum will fix it
and it won't be corrupted.  Regardless of whether the backup
side fs is broken.  (Presumably the backup side hardware/
fs will be fixed quickly.)  And I don't care that --checksum means
that the rsync takes longer once a week.

Sounds like you're leaning toward "it's a niche feature
and let's not clutter up rsync (further)".

> 
> On 02/12/13 15:42, Karl O. Pinc wrote:
> > Hi,
> >
> > I use rsync with hardlinks for backup, once a week doing checksums
> > to ensure there's no filesystem corruption in the backed-up data.
> >
> >
> > I also use tmpwatch, or something similar, to clean up /tmp, it
> > removes files that have not been accessed recently. (atime older
> > than some configured limit). I backup /tmp because I throw stuff in
> > tmp that I might possibly need again but don't want to bother
> > having to remember to delete -- and if I'm expecting to have useful
> > data somewhere I want it backed up.
> >
> > However, rsync's checksumming (naturally) updates the atimes of the
> > files in /tmp, and so tmpwatch never deletes them.
> >
> > It occurs to me that a handy solution might be to have an rsync
> > option, similar to the --exclude option, which would allow
> > checksumming to happen throughout most of the backup process but
> > would do "regular" size/timestamp based backups on certain
> > directories.
> >
> > What do people think of such an option? Is there a better design.
> > (E.g. an option that, er, preserves atime when checksumming?) Is
> > rsync just too overloaded with options already and it would be
> > better instead to run two instances of rsync?  Is there a
> > bug/feature in process already that would address the use-case
> > above?
> >
> > I'd like to have a sensible design before even thinking about
> > patching.
> >
> > Thanks for the feedback.
> >
> > Regards,
> >
> > Karl <kop at meme.com> Free Software:  "You don't pay back, you pay
> > forward." -- Robert A. Heinlein
> >
> 
> --
> ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-
> *~'`^`'~*-,._.,-*~
> 	Kevin Korb			Phone:    (407) 252-6853
> 	Systems Administrator		Internet:
> 	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
> 	Orlando, Florida		kmk at sanitarium.net (personal)
> 	Web page:			http://www.sanitarium.net/
> 	PGP public key available on web site.
> ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-
> *~'`^`'~*-,._.,-*~
> 




Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein



More information about the rsync mailing list